Computational Science and High Performance

0 downloads 0 Views 6MB Size Report
of September 9, 1965, in its current version, and permission for use must always be obtained from ..... The following operational principles are identified to guide ...... tion laws, written in the form of partial differential equations for an ideal non-heat- ...... The solution of the boundary problems on flows over airfoils often comes.
88

Notes on Numerical Fluid Mechanics and Multidisciplinary Design (NNFM)

Editors E. H. Hirschel/München K. Fujii/Kanagawa W. Haase/München B. van Leer/Ann Arbor M. A. Leschziner/London M. Pandolfi/Torino J. Periaux/Paris A. Rizzi/Stockholm B. Roux/Marseille Y. I. Shokin/Novosibirsk

Computational Science and High Performance Computing Russian-German Advanced Research Workshop, Novosibirsk, Russia, September 30 to October2, 2003

Egon Krause Yurii I. Shokin Michael Resch Nina Shokina (Editors)

13

Professor em. Professor h.c. Dr. Egon Krause Aerodynamisches Institut RWTH Aachen Wüllnerstr. zw. 5 und 7 52062 Aachen Germany

Professor Dr. Yurii I. Shokin Siberian Branch of the Russian Academy of Sciences Institute of Computational Technologies Ac. Lavrentyeva Ave. 6 630090 Novosibirsk Russia

Professor Dr. Michael Resch Dr.-Ing. Nina Shokina High Performance Computing Center Stuttgart HLRS Allmandring 30 70550 Stuttgart Germany

ISBN 3-540-24120-5 Springer Berlin Heidelberg NewYork Library of Congress Control Number: 2004116859 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitations, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springeronline.com © Springer-Verlag Berlin Heidelberg 2005 Printed in Germany The use of general descriptive names, registered names trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Digital data supplied by editors Cover design: deblik Berlin Printed on acid free paper 89/3141/M - 5 4 3 2 1 0

NNFM Editor Addresses

Prof. Dr. Ernst Heinrich Hirschel (General editor) Herzog-Heinrich-Weg 6 D-85604 Zorneding Germany E-mail: [email protected] Prof. Dr. Kozo Fujii Space Transportation Research Division The Institute of Space and Astronautical Science 3-1-1, Yoshinodai, Sagamihara, Kanagawa, 229-8510 Japan E-mail: [email protected] Dr. Werner Haase Höhenkirchener Str. 19d D-85662 Hohenbrunn Germany E-mail: [email protected] Prof. Dr. Bram van Leer Department of Aerospace Engineering The University of Michigan Ann Arbor, MI 48109-2140 USA E-mail: [email protected] Prof. Dr. Michael A. Leschziner Imperial College of Science, Technology and Medicine Aeronautics Department Prince Consort Road London SW7 2BY U. K. E-mail: [email protected]

Prof. Dr. Maurizio Pandolfi Politecnico di Torino Dipartimento di Ingegneria Aeronautica e Spaziale Corso Duca degli Abruzzi, 24 I - 10129 Torino Italy E-mail: [email protected] Prof. Dr. Jacques Periaux Dassault Aviation 78, Quai Marcel Dassault F-92552 St. Cloud Cedex France E-mail: [email protected] Prof. Dr. Arthur Rizzi Department of Aeronautics KTH Royal Institute of Technology Teknikringen 8 S-10044 Stockholm Sweden E-mail: [email protected] Dr. Bernard Roux L3M – IMT La Jetée Technopole de Chateau-Gombert F-13451 Marseille Cedex 20 France E-mail: [email protected] Prof. Dr. Yurii I. Shokin Siberian Branch of the Russian Academy of Sciences Institute of Computational Technologies Ac. Lavrentyeva Ave. 6 630090 Novosibirsk Russia E-mail: [email protected]

Preface

This volume is published as the proceedings of the Russian-German Advanced Research workshop on Computational Science and High Performance Computing in Novosibirsk Academgorodok in September 2003. The contributions of these proceedings were provided and edited by the authors, chosen after a careful selection and reviewing. The workshop was organized by the Institute of Computational Technologies SB RAS (Novosibirsk, Russia) and the High Performance Computing Center Stuttgart (Stuttgart, Germany). The objective was the discussion of the latest results in computational science and to develop a close cooperation between Russian and German specialists in the above-mentioned field. The main directions of the workshop are associated with the problems of computational hydrodynamics, application of mathematical methods to the development of new generation of materials, environment protection problems, development of algorithms, software and hardware support for highperformance computation, and designing modern facilities for visualization of computational modelling results. The importance of the workshop topics was confirmed by the participation of representatives of major research organizations engaged in the solution of the most complex problems of mathematical modelling, development of new algorithms, programs and key elements of new information technologies. Among the Russian participants were researchers of the Institutes of the Siberian Branch of the Russian Academy of Sciences: Institute of Computational Technologies, Institute of Computational Mathematics and Mathematical Geophysics, Institute of Computational Modelling, Russian Federal Nuclear Center, All-Russian Research Institute of Experimental Physics, Kemerovo State University. Among the German participants were the heads and leading specialists of the High Performance Computing Center Stuttgart (HLRS) (University of Stuttgart), Institute of Hydraulic Fluid Mechanics (University of Stuttgart), Institute of Aerodynamics and Gasdynamics (University of Stuttgart), Center for High Performance Computing (ZHR) (Dresden University of Technology), Institute of Aerodynamics RWTH (Aachen),

VIII

Preface

Institute of Applied Mathematics (University of Freiburg i. Br.), Institute of Astronomy and Astrophysics (University of Tuebingen), Institute of Fluid Mechanics (University of Erlangen - Nuernberg). The collaboration between Siberian and German specialists in computational science has a long, steady and successful history. The stability of such relations and their prospects are based on the active participation of young scientists, which prompted the organizers to establish a youth section in the workshop. In 2003 its participants were recent postgraduate students of Novosibirsk University and Novosibirsk Technical University and young researchers of the Institute of Computational Technologies. The scope of the contributions is wide. Hence this volume provides state-of-the-art scientific papers, gives the opportunity to learn about the latest results of other SB RAS research institutions, and spurs discussions about the future of computational sciences and information technologies. We hope that such scientific workshops will have a brilliant future and the topics of future workshops will be characterized by the same high level of relevance and scope. The editors would like to express their gratitude to all the participants of the workshop and wish them a further successful and fruitful work.

Novosibirsk-Stuttgart, August 2004

Yurii Shokin Michael Resch

Contents

Information and telecommunication systems for emergency management Yu.I. Shokin, L.B. Chubarov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

High Performance Computing in Engineering and Science M. Resch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Completely splitting method for the Navier-Stokes problem I.V. Kireev, U. R¨ ude, V.V. Shaidurov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Methods of shock wave calculation V.F. Kuropatenko . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Distributed and collaborative visualization of simulation results U. Lang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Safety problems of technical objects V.V. Moskvichev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Direct numerical simulations of shock-boundary layer interaction at M a = 6 A.Pagella, U. Rist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Mathematical models of filtration combustion and their applications A.D. Rychkov, N.Yu. Shokina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Computer simulation at VNIIEF I.D. Sofronov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Mathematical modeling of optical communication lines with dispersion management M.P. Fedoruk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

X

Contents

Method of particles for incompressible flows with free surface A.M. Frank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Direct and inverse problems in the mechanics of composite plates and shells S.K. Golushko . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Numerical simulation of plasma-chemical reactors Yu.N. Grigoryev, A.G. Gorobchuk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 The application of smoothed particle hydrodynamics for the simulation of diesel injection S. Holtwick, H. Ruder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Some features of modern computational mathematics: problems and new generation of algorithms Yu.M. Laevsky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Efficient flow simulation on high performance computers T. Zeiser, F. Durst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Simulation of problems with free surfaces by a boundary element method K.E. Afanasiev, S.V. Stukolov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Simulation and optimisation for hydro power E. G¨ ode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 The analysis of behaviour of multilayered conic shells on the basis of nonclassical models V.V. Gorshkov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Simulation of the motion and heating of an irregular plasma N.A. Huber . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Numerics and simulations for convection dominated problems D. Kr¨ oner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Modified Finite Volume Method for Calculation of Oceanic Waves on Unstructured Grids A.V. Styvrin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Performance aspects on high performance computers — from microprocessors to highly parallel smp systems H. Mix, W.E. Nagel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389

List of Contributors

K.E. Afanasiev Kemerovo State University Krasnaya ul. 6 Kemerovo, 650043, Russia L.B. Chubarov Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia F. Durst Institute of Fluid Mechanics, University of ErlangenNuremberg, Cauerstraße 4, 91058 Erlangen, Germany M.P. Fedoruk Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia A.M. Frank Institute of Computational Modelling SB RAS Academgorodok Krasnoyarsk, 660036, Russia

E. G¨ ode Institute for Fluid Mechanics and Hydraulic Machinery University of Stuttgart Pfaffenwaldring 10 Stuttgart, 70550, Germany

S.K. Golushko Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia

V.V. Gorshkov Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia

Yu.N. Grigoryev Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia

XII

List of Contributors

A.G. Gorobchuk Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia S. Holtwick Institute of Theoretical Astrophysics, University of T¨ ubingen Auf der Morgenstelle 10 T¨ ubingen, 72076, Germany N.A. Huber Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia I.V. Kireev Institute of Computational Modelling SB RAS Academgorodok Krasnoyarsk, 660036, Russia D. Kr¨ oner Institute of Applied Mathematics, University of Freiburg i. Br. Hermann-Herder-Str. 10 Freiburg i. Br., 79104, Germany V.F. Kuropatenko Russian Federal Nuclear Center P.O. Box 245 Snezhinsk, 456770, Russia Yu.M. Laevsky Institute of Computational Mathematics and Mathematical Geophysics SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia

U. Lang High Performance Computing Center Stuttgart (HLRS), University of Stuttgart Allmandring 30 Stuttgart, 70550, Germany

H. Mix Center for High Performance Computing (ZHR), Dresden University of Technology Dresden 01062, Germany

V.V. Moskvichev Institute of Computational Modelling SB RAS Academgorodok Krasnoyarsk, 660036, Russia

W.E. Nagel Center for High Performance Computing (ZHR), Dresden University of Technology Dresden, 01062, Germany

A.Pagella Institute of Aerodynamics and Gasdynamics, University of Stuttgart Pfaffenwaldring 21, Stuttgart, 70550, Germany

M. Resch High Performance Computing Center Stuttgart (HLRS), University of Stuttgart Allmandring 30 Stuttgart, 70550, Germany

List of Contributors

U. Rist Institute of Aerodynamics and Gasdynamics, University of Stuttgart Pfaffenwaldring 21, Stuttgart, 70550, Germany U. R¨ ude University of Erlangen–Nuremberg Cauerstraße 6 Erlangen, 91058, Germany H. Ruder Institute of Theoretical Astrophysics, University of T¨ ubingen Auf der Morgenstelle 10 T¨ ubingen, 72076, Germany A.D. Rychkov Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia V.V. Shaidurov Institute of Computational Modelling SB RAS Academgorodok Krasnoyarsk, 660036, Russia Yu.I. Shokin Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia

XIII

N.Yu. Shokina High Performance Computing Center Stuttgart (HLRS), University of Stuttgart Allmandring 30 Stuttgart, 70550, Germany

I.D. Sofronov All-Russia Research Institute of Experimental Physics Mir Ave. 37, Sarov, 607190, Russia

S.V. Stukolov Kemerovo State University Krasnaya ul. 6 Kemerovo, 650043, Russia

A.V. Styvrin Institute of Computational Technologies SB RAS Lavrentiev Ave. 6 Novosibirsk, 630090, Russia

T. Zeiser Regional Computing Center Erlangen, University of ErlangenNuremberg, Martensstraße 1, Erlangen, 91058, Germany

Information and telecommunication systems for emergency management Yu.I. Shokin1 and L.B. Chubarov2 1 2

Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia [email protected] Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia [email protected]

Summary. Collecting and sharing timely, reliable and accurate information during a crisis is critical to improving humanitarian response, maximizing resources and minimizing human suffering. The faster humanitarian organizations are able to collect, analyse and disseminate critical information, the more effective the response becomes and the more lives are potentially saved. Though humanitarian information functions, systems and tools have improved in the past five years, a combination of operational, funding and technical constraints, combined with a lack of awareness, continues to handicap information from becoming a core, wellresourced component of relief operations.

1 Introduction All too often, when a crisis erupts, valuable time is wasted gathering baseline information about an affected area, which is often already available on the Internet. Even more troubling are the instances in which our greatest challenge is not the lack of information but rather too much of it from too many, sometimes conflicting, sources – making it difficult to discern the most critical and relevant data from the not so useful. Considerable progress has been made to date in developing information systems, tools and Web sites and in establishing standards for their use [110]. In particular, the ReliefWeb, Integrated Regional Information Network (IRIN) and the Humanitarian Information Center (HIC) models as successful examples of international and field-level activities and services that form a solid basis for future work. But much remains to be done to build upon these approaches and continue to meet the demands of decision-makers and other stakeholders.

2

Yu.I. Shokin and L.B. Chubarov

2 Principles of Humanitarian Information Management and Exchange The fundamental principle that the purpose of humanitarian assistance is to assist affected and at risk people. Information management and exchange should reflect this humanitarian imperative and promote more effective humanitarian action. The following operational principles are identified to guide information management and exchange activities: Accessibility. Humanitarian information and data should be made accessible to all humanitarian actors by applying easy-to-use formats and by translating information into common or local languages when necessary. Information and data for humanitarian purposes should be made widely available through a variety of online and offline distribution channels including the media. Inclusiveness. Information management and exchange should be based on a system of collaboration, partnership and sharing with a high degree of participation and ownership by multiple stakeholders, especially representatives of the affected population. Inter-operability. All sharable data and information should be made available in formats that can be easily retrieved, shared and used by humanitarian organizations. Accountability. Users must be able to evaluate the reliability and credibility of data and information by knowing its source. Information providers should be responsible to their partners and stakeholders for the content they publish and disseminate. Verifiability. Information should be accurate, consistent and based on sound methodologies, validated by external sources, and analyzed within the proper contextual framework. Relevance. Information should be practical, flexible, responsive, and driven by operational needs in support of decision-making throughout all phases of a crisis. Objectivity. Information managers should consult a variety of sources when collecting and analyzing information so as to provide varied and balanced perspectives for addressing problems and recommending solutions. Humanity. Information should never be used to distort, to mislead or to cause harm to affected or at-risk populations and should respect the dignity of victims. Timeliness. Humanitarian information should be collected, analyzed and disseminated efficiently, and must be kept current. Sustainability. Humanitarian information and data should be preserved, cataloged and archived, so that it can be retrieved for future use, such as for preparedness, analysis, lessons learned and evaluation. In support of these principles, a number of key themes is to be considered when developing and implementing humanitarian information management and exchange systems.

Information and telecommunication systems for emergency management

3

1) User Requirements. Information management systems should meet the clearly defined needs of users and decision-makers, and aim to reduce the effects of information overload. 2) Quality of Data and Information. To be useful, data and information must be relevant, accurate and timely. Ensuring quality requires the development of, and adherence to, standards for information collection, exchange, security, attribution and use. In addition, it is vital to maintain a strong sense of professional ethics at every stage of information system design and implementation, including such elements as independence and impartiality, in pursuit of humanitarian action. 3) Technology. Technology is a powerful enabler. Technology should not, however, undermine, distort or overshadow content. Achieving humanitarian objectives by using technology is not primarily a question of hardware and software, but rather of cost-effectiveness and appropriateness for achieving desired humanitarian outcomes. Information system designers should consider explicit and proactive efforts for making systems relevant and easy to use, particularly in remote areas. This includes bridging the technological divide by building capacity, promoting the exchange of knowledge and skills between local and international actors and making information available through a variety of means in a variety of formats. Human judgment, rather than technology, is the basis for operational decisions. Whereas information technology is a platform, information management is a process that includes a combination of design, data collection, data entry, data integration and management, analysis, data dissemination and output. Common methodologies and technical specifications are an integral part of any data management system because standards allow integration and analysis among different sources. Standards also allow information managers to prepare ready-made systems and tools for response. 4) Partnerships. Successful information management systems encourage openness, inclusiveness and sharing. This strengthens relations, trust and coordination among multiple stakeholders. Multiple information systems, including Web sites and databases, operating at global, regional and local levels, create the potential for an unprecedented degree of cooperation between organizations and people at the field level, between the field and headquarters and between the international and local communities. Global information systems are particularly important for building trust and achieving buy-in at an institutional level. These systems have succeeded in building partnerships around data and document repositories and by creating online communities. For example, OneWorld (http://www.oneworld.net) acts as a “meeting place” for its 1000 partners who can use the site to reach a monthly audience of hundreds of thousands of journalists, broadcasters, educators, aid workers and members of the public. ReliefWeb relies on relationships

4

Yu.I. Shokin and L.B. Chubarov

with more than 700 content partners to keep its database of 150,000 documents on more than 40 humanitarian emergencies fresh and balanced. The World Bank’s Gateway (http://www.worldbank.org/gateway) has created a data repository that consolidates statistics from national and international organizations that offer data, or information about data, on their sites. Finally, AlertNet created the Professional Zone, a password-protected area of the site that allows its 172 members to post news, comments and contacts of use to other humanitarian professionals. 5) Preparedness. One of the most important aspects of humanitarian information management and exchange is preparation. Information-related efforts that are incrementally resourced and initiated only as emergency situations unfold tend to remain behind the curve and reactive. This leads to a failure to provide timely information that is accurate and contextual. Preparedness measures such as base data preparation for high-risk areas, national-level capacity building and the formation of institutional relationships prior to deployment enable information management and exchange systems to effectively support assistance efforts once an emergency begins. Preparation also includes planning for sustainability and/or exit strategies.

3 Best Practices The following is a set of best practices derived from the principles and themes summarized above and identified as integral to the future success of humanitarian information management and exchange. In complex emergencies and natural disasters, the humanitarian community should: Define user needs and emphasize data sets and formats that directly support decision-making at the field level. Identify user groups, conduct user requirement analysis, inventory information resources inventory and define core information products based on user input. Develop and implement information products on operationally relevant themes, such as the location and condition of the affected population, “who is doing what, where?” and factors affecting access to affected populations. Use templates such as the Rapid Village Assessment (RVA) tool to speed data collection. Create maps to effectively communicate information to decision-makers. Collect and analyze base data and information before and throughout an emergency. Gather, organize and archive data and information on operationally relevant themes for high-risk areas in preparation for emergencies. Maintain and enhance data sets during emergency responses. Document and archive data so that it is easily accessible for future use. Maintain and promote data and information standards. Follow generally accepted standards for information exchange, such as the Structured Humanitarian Assistance Reporting (SHARE) standard to promote data

Information and telecommunication systems for emergency management

5

sourcing, dating and geo-referencing. The SHARE standard facilitates integration of data from multiple sources and enhances verifiability, assessment, analysis and accountability. Geo-referencing data during collection allows cartographic presentation and geographic information system (GIS) analysis. Create metadata catalogs as part of a standard documentation process with handover procedures. Maximize resources by expanding partnerships. Recognize that data and information are collected and managed by a variety of actors including national governments, UN agencies, NGOs, the private sector and research institutions and that the contributions of these providers are crucial. Pre-establish inter-agency agreements and relationships at the national and local levels. Establish an ongoing process of personal interaction to create partnerships for information management and exchange. Use distributed networks and neutral portal repositories to assist with information sharing and promote linkages to avoid duplication of effort. Engage local and national actors in information projects. Develop networks of local communities and national NGOs, civil society groups and the private sector and address the issue of local participation as part of overall emergency planning, monitoring and evaluation. Build and strengthen the national/local capacity in information management and exchange and promote the transfer and use of local knowledge. Maintain preparedness “toolboxes” for online and offline distribution. These toolboxes provide guidelines and reference tools for the rapiddeployment of HICs or the establishment of Web sites and databases under a variety of field conditions. Toolboxes should include data standards, operating procedures, training materials, database templates and manuals. Define an exit strategy. Develop a clear phase-out strategy, including transition to development activities and creation of archiving systems to maintain access by current and future stakeholders after the project is closed. Preserve institutional operational memory. Define and adhere to sound data and information management policies and techniques for handling large volumes of information. Document datasets with metadata. Maintain quality control and organizational learning to avoid the need to start from scratch with each emergency and to maintain quality of information services during emergencies. Establish field-based HICs according to identified operational and decision-making demand. Design them as open-access physical locations, incorporate existing capacities, systems and information management activities. Serve as a neutral broker of humanitarian information, providing value-added products and beneficial services to the field-based humanitarian community. Encourage broad participation from local, national and international actors to facilitate and support humanitarian response activities. Form partnerships with specialized agencies and sector experts to conduct sectoral surveys and analyses.

6

Yu.I. Shokin and L.B. Chubarov

Use appropriate technology. Ensure that field information systems reach the broadest possible audience. Be aware of the limitations of technology (both inherent and as related to availability). For example, keep in mind that the Internet, while powerful, is not a panacea and can be ineffective as a distribution channel to and from remote areas. Consider making data products, particularly databases, available via e-mail, CD-ROM and for local download. Recognize that local staff’s ability to work with the technology is an important determinant of success. Technology should be easy to use and be accompanied by training for local staff. It is necessary to develop a clear, phase-out strategy, including transition to development activities and creation of archiving systems to maintain access by current and future stakeholders after the project is closed. Recent advancements in the sophistication, speed and portability of information technology, satellite communications and GIS mapping tools make up-to-the-minute information analysis, verification, extraction and distribution both possible and powerful. The proliferation of Internet technologies and evolution of the Web have allowed humanitarians to quickly and cost-effectively reach a global audience, from both headquarters and, increasingly, from the field. Frequently, competing and proprietary formats make file-sharing cumbersome, a lack of standards makes data collection inefficient, while disparities in connectivity and technical ability make information inaccessible to those who need it most. Achieving humanitarian objectives via technology is therefore not a question of hardware and software, but rather of access and appropriateness. One of the greatest challenges to information systems, particularly in the field, is access. Even the most robust databases and powerful search engines are worthless if users are unable to retrieve the information in or through them. Information system designers should exercise explicit and proactive efforts for making systems relevant and easy to use, particularly in remote areas. To this end, information managers should forego using the latest systems and tools in favor of technologies that enable the broadest possible use and reach. The Internet, though powerful, is not always the best distribution channel, particularly in remote areas. While field missions may not always have Internet, satellite or cellular connections, most do have access to laptops. For this reason, information specialists in the field should distribute their data products, including databases, encyclopedias, maps and assessments, on CDROM. This achieves a wide distribution at little marginal cost. On the other hand, the Internet is effective at feeding information from the field back to headquarters. Wherever possible, information products should be developed in Internetready formats Use open data formats and inter-operable technologies. Use commercial, off-the-shelf technology and create all information products using open data formats and inter-operable technologies. Promote awareness and training. Conduct technology training sessions for non-technical humanitarian staff, particularly national staff. Edu-

Information and telecommunication systems for emergency management

7

cate senior decision-makers in humanitarian organizations about the purpose, strengths and weaknesses of information management and exchange. Broaden participation in information projects among affected and at-risk populations. Involve the private sector. Consider the efficiencies of contracting information management and exchange functions to the private sector, especially local private interests, when cost-effective and appropriate. Encourage a constructive role for the private sector by incorporating private-sector expertise into preparedness and planning activities. Humanitarian information systems have yet to tap into the full potential of the private sector and academia, particularly in the area of hardware and software development. However, some significant private sector and academic initiatives have found humanitarian applications. For example, Microsoft is working with Mercy Corps and Save the Children to develop logistics tracking and needs assessment software packages for use with PDAs. Benetech, a Silicon Valley-based technology non-profit, built the Martus Project (http://www.martus.org), an information storage and retrieval system that brings efficiency to the storage and retrieval of human rights violations data, to speed up the response to violations, and, in some cases, prevent additional abuses. In addition, ESRI, a California-based GIS and mapping software company, has developed the Geography Network (GeoNet), a global network of geographic information users and providers that enables the sharing of geographic information between data providers, service providers and users around the world. Through the Geography Network, users can both access and post many types of geographic content, including live maps and downloadable data. FAO is already using GeoNet with a community of experts around the world who need access to data to create maps of disasters. Among academic institutions, the University of Georgia’s Information Technology Outreach Services (ITOS), under contract with the GIST, is currently working with the Afghanistan Information Management Service (AIMS), the Sierra Leone Information System (SLIS) and the Data Platform for the Horn of Africa (DEPHA) to manage and host a data repository of critical high-memory graphics, satellite imagery and metadata files. Cambridgebased aidcommunity.org (http://www.aidcommunity.org) allows aid workers in the field to access both the Web and each other and provides them with easy-to-read information packages they need during relief operations. It is necessary to encourage a constructive role for the private sector and academia by incorporating Mobilize adequate resources. Include funding for field-level information management and exchange systems and projects in the overall resourcing of assistance programs. 3.1 Recommendations and Follow-Up Actions Specific areas to be addressed through this follow-up process include:

8

Yu.I. Shokin and L.B. Chubarov

User requirements. Explore the linkages between data, information and decision-making in critical areas, such as assessments, “who is doing what, where?” and other operational information, particularly in the field. Improve the exchange of data and information collected during natural disasters and complex emergencies for operational purposes as well as to strengthen the database on global disaster impacts over the long-term. Quality of Information. Develop and disseminate standards, ethical guidelines and codes of conduct to address issues of data quality and information integrity. Technology. Evaluate and report on successful applications of new and existing technologies. Identify technology partners and promote the dissemination of appropriate technology practices for varying end uses. Discuss the application of these technologies in a future forum. Partnerships. Strengthen the linkages among existing information systems. Improve relationships between these systems and their stakeholders including decision-makers at the field and headquarters level, as well as with the affected population. Establish public-private partnerships especially in the area of systems and tools development. Define the roles of sector specialists and the media. Preparedness. Promote the preparation of base data for high-risk areas. Calculate and disseminate risk assessments, and build national capacity and develop toolboxes for rapid mobilization of HICs. Raise donor- and, where appropriate, media-awareness of the importance of information preparedness to humanitarian action. Field-level coordination. Improve field-level information coordination among multiple actors including the UN resident coordinator and UN country team, NGOs, academia, the affected population and other stakeholders. Facilitate OCHA’s role as an information field focal point or partner. Evaluate and implement field-level information policies such as access and exit strategies. Related to these is the development of, and adherence to, procedural, technical and ethical standards for information collection, exchange, security, attribution and use. Using standards allows information managers to better handle the large volumes of data and information generated during a crisis, to ensure the integrity of the data and to avoid having to start from scratch every time an emergency erupts. To better meet the demands of decision makers and other stakeholders, information projects should also reduce the effects of information overload and serve both operational and strategic needs of decision makers at all levels. To achieve this, information managers should identify their target audiences and create high-value products to improve data collection, synthesis and analysis. The creation of tools such as the Who Is Doing What Where database, the development of standards under the Structured Humanitarian Assistance Reporting (SHARE) and Global Identifier Numbers (GLIDE) initiatives and the application of these standards to geographic information and mapping, are identified as some ways to improve operational effectiveness.

Information and telecommunication systems for emergency management

9

Another critical aspect of humanitarian information management and exchange is preparedness. Information flow during a crisis can be crippled by lack of time, few resources, isolated decision-making, limited information sources and sparse communication among actors, making it difficult to gather and process accurate data in a timely way. The preparation of baseline data for high-risk areas, the development of toolkits for the deployment of rapid response Humanitarian information centres (HICs) and the coordination between international and national partners in the field contribute to a more efficient and effective response. A humanitarian information centre (HIC) is a physical meeting space, staffed by specialists, where humanitarian actors can go to get their questions addressed. To be effective, HICs should be seen as reliable, trusted information sources that are integral parts of inter-agency coordination structure. More than just information clearinghouses, HICs should be service-based and provide value back to the data providers that supply it. As a matter of principle, HICs should always: ∗serve operational needs in that they are practical, flexible, relevant, responsive and timely; ∗become integral parts of the decision-making process, including education and development of institutional capacity; and ∗be supported by general principles of accessibility (location, language, format, outreach), reliability, accuracy (consistency and context) and interoperability. National and local partnerships should also be considered when procuring and applying technology. Even the most robust databases and search engines are useless if few are able to access or use them. Information managers and technicians should therefore refrain from adopting the latest, bleeding-edge technologies in favour of robust, well-tested and cost-effective tools that allow for the broadest possible use and reach, conform to the sophistication of the local infrastructure and allow for appropriate training for technical and nontechnical staff. The greatest challenge for this field is creating a culture of information sharing that promotes the systematic collection, use and free flow of data, information and ideas, facilitates informed decision-making and builds trust and commitment among stakeholders. In view of what is at stake in humanitarian operations, the consequences of not sharing information are too high to ignore. In the past five years, the use of information in humanitarian operations has come of age. Global information services such as ReliefWeb (http://www. reliefweb.int), AlertNet (http://www.alertnet.org) and IRIN (http: //www.irinnews.org) have revolutionised the way humanitarian information is catalogued and disseminated. Field-based Web sites have brought international, national and local partners together to address the needs of practitioners and local populations. Though field-based systems have improved operational response, they have

10

Yu.I. Shokin and L.B. Chubarov

been unable to address the needs of decision makers or fully engage national and local actors in the strategic and technical aspects of their work. Fieldbased Web sites are essential components of humanitarian operations. They should be simple, straightforward, and directly address the needs of a clearly defined target audience. However, in order to be successful and transferable across countries and crises, key institutions and partners should develop common policies governing the establishment and maintenance of field-based Web sites. The overriding objective of all field-based Web sites is to make the coordination of humanitarian assistance more effective. Within that objective, field-based Web sites should support the following goals: facilitate humanitarian coordination, improve operations, inform decision-making, promote early warning. In addition, field-based Web sites are uniquely positioned to preserve institutional memory by constantly updating, retooling and archiving information as emergencies evolve. The principles that guide humanitarian action, in particular humanity, impartiality, neutrality and independence, should be applied to information management and exchange, and, of course, field-based Web sites. Overall, the substance of field-based Web sites should be based on the needs of practitioners and decision makers within the context of the specific crisis. There are, however, some common content pieces that should be a part of every field-based Web site including: Resource Centre, including a document library, map centre and database, Community Area, including a notice board, contact list, site map and vacancies section, Support Services, Links to relevant and credible news and information sources To make these sites most effective, Web site producers should avoid using heavy graphics. File sizes should be indicated so that users can decide whether or not to take the time needed to download information. In order to handle the large volumes of data and information generated during a crisis field-based Web sites should follow a pre-established set of standards that guide: Procedures, including information collection methodologies, language, etc; Technology, including the use of hardware and software packages to ensure maximum inter-operability, development and maintenance by local staff; Data, including standards for ensuring data formats, content and quality; and Metadata, including pre-established and common standards for identifying and documenting data. Like other field-based information systems, Web sites are able to keep content fresh and balanced. To achieve support and buy-in from partners, it is important to involve partners and stakeholders in the early conceptualisation and development of the Web site, and by promoting the site and “selling” the benefits of its products and services on a regular basis once the site has been established. Education and training can also enhance buy-in. In order to be sustainable, field-based Web sites should define an exit strategy early on. Such a strategy might include duplicating or “mirroring” a

Information and telecommunication systems for emergency management

11

site on a server outside of the host country, building in long-term partnerships and pursuing new and diverse sources of income. Products such as standardized survey forms, assessments, standard operational procedures, place – or “p” – codes (unique numeric codes that identify geographic locations), geo-spatial analyses and Who Is Doing What Where (WWW) databases have been successful at addressing some of the needs of practitioners and improving operational effectiveness in the field. Geographic Information Systems (GIS) have also become powerful analytic and common reference instruments when applied to both complex emergencies and natural disasters. Products such as situation reports and appeals distributed via global information services have been successful at synthesizing operational information and communicating field situations to headquarters. This gives decision makers at headquarters and in donor organizations an accurate and evolving picture of what is taking place on the ground. However, an uneven information flow during an emergency makes it difficult to serve the needs of decision makers. At the core of this issue are information analysis and data collection. On the one hand, a glut of narrative, non-operational information is overwhelming and therefore useless when it comes to quick decision-making and timely action. On the other hand, data necessary for operational decision-making is not available in the form or level of detail required. Information systems, particularly HICs, should better reduce the effects of information overload and facilitate data synthesis and analysis by providing experts with technical and mapping support, by applying their knowledge of data sets to better package analysis and by playing a coordinating role in collecting and disseminating this analysis to others. In addition, information systems should secure, streamline and standardize the process of collecting relevant data in a way that allows senior managers to easily grasp the issues and apply the analysis quickly and to good effect. Operational response. By converting raw data into rich databases and dynamic maps, developing common procedures for surveys and needs assessments and designing data repositories with rapid retrieval search applications, humanitarian information systems are improving the operational effectiveness of humanitarian practitioners; Capacity building. But perhaps the greatest challenge for this field is creating a culture of information sharing that promotes the free flow of data, information and ideas, facilitates informed decisionmaking and builds trust and commitment among stakeholders. More than a task for steering committees and working groups, it is a process that requires strong leadership, vision and investment. Information management is not a set of discreet tasks, but a process that underpins all aspects of humanitarian response and requires long-term institutional support and ample, sustained investment. Considering what is at stake in humanitarian operations, the consequences of not sharing information are too high to ignore.

12

Yu.I. Shokin and L.B. Chubarov

Just as the uncoordinated arrival of relief supplies can clog a country’s logistics and distribution system, the onslaught of unwanted, inappropriate and unpackaged information can impede decision-making and rapid response to an emergency. These challenges highlight the need for more systematic ways to process and standardize information, as well as to begin information gathering and sharing on vulnerable countries well in advance of crises.

4 Role of new communication and information technologies From a physical perspective, it is anticipated that significant damage would be sustained by basic infrastructure such as transportation networks, utilities and buildings (including some emergency operations centres). From a communication perspective, it is anticipated that telephone (including cellular) service would be severely degraded because of network congestion and physical damage. Two-way radio communication services would also be impacted by physical damage, loss of power and congestion. Equally problematic is the lack of technical compatibility among agency radio systems which operate on unique radio frequencies and share few common frequencies to support inter-agency coordination. These problems would likely result in emergency managers being unable to reach designated emergency operations centres for considerable periods of time and in the meantime being kept out the information flows and hence decision-making structures in which they play critical roles. New developments in wireless and fixed information networking open significant opportunities for addressing some of these EOC participation problems, especially in helping to integrate and provide alternative means of access to emergency management information systems. The application of digital communication techniques and the adoption of common communication protocols are bringing about a revolution in communication networking and electronic information sharing. These developments are also spawning the convergence of previously independent communication media such as radio and television broadcasting, computers and wired and wireless telecommunications systems to forge new forms of addressable and personalized communications services linking private and public organizations all over the world and laying the foundations of new information highways. Traffic over these networks is translated into packets of data which are controlled electronically rather than physically and flow over ’virtual’ networks created and flexibly managed by computer software. The result is that the same information can now be addressed and sent over a variety of communication media, and if properly designed and implemented, sent with a high degree of accuracy and speed. This also means that around the clock access to information services can be provided from fixed or mobile and remote loca-

Information and telecommunication systems for emergency management

13

tions and increasingly through a variety of substitutable telecommunications means, including cabled and wireless facilities. These new facilities are bringing about widespread change to emergency management practices. Few emergency management agencies are not using automated information processing techniques or increasingly becoming reliant upon electronic networking to support both intra and interagency communication requirements. These changes are not entirely due to conscious decisions being made by emergency managers to embrace these technologies for emergency management purposes, but rather, are also influenced by larger societal considerations of local, regional and national level governments who are viewing investments in national information infrastructure (highways) as strategically important to achieving broader social and economic goals. For the emergency management community, a key consideration is determining how to apply these advanced systems in the struggle to lesson the vulnerability of societies and ecosystems from natural and technologically based hazards. In this new environment, the challenge then is not to determine how to construct proprietary networks specifically designed for emergency managers, but rather to determine how to add value to emerging inter-connectable and addressable networks over existing telecommunications networks. Such a proposition, however, calls for greater cross-representation among local, provincial, national and international emergency management programs and processes and network and application developers and administrators.

5 OCHA – Project on Emergency Telecommunications with and in the Field The major problems related to emergency communications in the field and still waiting for practical solutions, are as follows: ∗the problem of safety and security in the field; ∗the problem of ad-hoc telecommunication services for the affected population; ∗the problem of restoring normal telecommunication services to the affected population after the disaster; ∗the compatibility of equipment used in the field by various partners in international humanitarian assistance; It is suggested to focus future efforts on creation and operation of a global emergency telecommunication/information infrastructure, accessible 24 hours a day from any place on the earth. 5.1 Background and Mandate of the Project The use of telecommunications by UN agencies and by non-governmental entities involved in humanitarian assistance has been a difficult and sensitive

14

Yu.I. Shokin and L.B. Chubarov

issue, with potential implications of political and technical nature. In spite of the trend towards globalization, our world of today is fragmented, and each sovereign country has its own system of laws, regulations, standards, and practices. Governments are often unwilling to allow importation and use of wireless telecommunication equipment by foreigners over their territories. As a consequence, using telecommunications with and in the field often necessitates in difficult and time-consuming negotiations. Emergency telecommunications has been conceived to satisfy the needs of humanitarian assistance before, during and after emergencies. In rescue and relief operations, tight time limits, combined with surrounding post-disaster chaos and the limited resources available, impose highest demands on the management, logistics, and coordination efficiency. The Tampere Convention on Emergency Telecommunications is the most important achievement of the project OCHA – Office for Coordination of Humanitarian Affairs (DRB). The Tampere Convention provides the framework for the use of telecommunications in international humanitarian assistance, removes regulatory barriers, and protects providers of telecommunication assistance while safeguarding the interests of the host country. It satisfied the requirements (often contradictory) of all parties interested, being the best compromise possible at the time of its adoption. An integrated system for access to documents and exchange of information, both for Headquarters and field staff (Intranet) and for humanitarian partners (Extranet/Internet), is being developed and will be deployed in 2003. A revamped OCHAOnline, OCHA’s official web site, will constitute the platform for Intranet, Extranet and Internet access to OCHA-related information. OCHA will continue to explore and provide tools to both its staff and the humanitarian community to take advantage of emerging information and communications technology. In particular, there will be a shift towards more web-based applications and remote access. Information tools, such as the OCHA Contact Directory, were refined and disseminated in 2002, improving and streamlining staff access to contact information. The OCHA ReliefWeb (http://www.Reliefweb.int/wget) has proven its great utility as electronic clearinghouse for the WGET (Working Group on Emergency Telecommunications) members. The ReliefWeb is a powerful interface between OCHA and the external world; its public part is visited four million times a month. In addition to its basic informative functions, it creates the image of the OCHA. Designed to serve the information needs of the international humanitarian relief community, ReliefWeb targets decision makers at all levels, from aid workers to government and UN officials, seeking to improve humanitarian response capacities through the timely dissemination of reliable information. ReliefWeb teams in New York, Geneva and Kobe, Japan, post updates throughout the day covering some 40 ongoing humanitarian emergencies, collecting and posting documents from over 700 sources. The site includes a map

Information and telecommunication systems for emergency management

15

centre, virtual library, training and vacancies sections, an appeals page, financial tracking section and other useful humanitarian resources and linkages. Online emergency coverage in the Asia Pacific region was consolidated through the ReliefWeb office in Kobe, ensuring service in the region’s time zone as well as 24-hour service globally. Almost 25,000 emergency response documents were disseminated in 2002 to ensure that timecritical information is accessible to facilitate humanitarian decision-making. These documents were published by humanitarian partners for 22 complex emergencies and over 95 natural disasters. A Virtual Library with more than 500 humanitarian reference documents was launched, as was the redesigned Humanitarian Directory, featuring over 300 organizations and links to 100 additional related sites. In collaboration with information partners, the unique identifier standard GLIDE (Global Identifier ) number for natural disasters was adopted. This standard allows for the integration and efficient exchange of disaster information among partners. The work towards increasing the awareness of international cooperation facilitating the use of telecommunications in humanitarian assistance was an important element in the project. 5.2 Constraints and Weaknesses of the Project The size of the resources engaged indicates that emergency communications is really not seen as a key factor contributing to the success or failure of field operations, in spite of public declarations at various levels. The problem is that the global emergency telecommunication system does not exist, and had never been attempted. The existing Global UN Telecommunication Network connecting regional offices is not easily accessible in the field. In our fragmented world, the integrating/converging trends compete with separating/diverging ones, and there is opposition against globalisation. Political division and mutual distrust combined with a fear of foreign dominance kept telecommunication sector monopolized in all countries. Wealthy societies developed telecommunication networks to satisfy their own needs, and had no incentives to extend them over poor regions unable to return the investments. As a consequence, the current emergency telecommunications is a patchwork of various technologies, protocols, and equipment, not always working together smoothly. This fragmentation creates serious problems in the field that only new technology can solve at a reasonable cost. One of serious problems still waiting for a practical solution is security and safety in the field. Good coordination requires a single coordinating body. It is especially important in view of a large number of entities involved in humanitarian assistance. Closely related to personal safety are problems of privacy, safety and security of information, and equipment. The open character of radio communications and vulnerability of computer systems implies severe privacy and

16

Yu.I. Shokin and L.B. Chubarov

security problems. Messages can be intercepted, and computers can be paralyzed for terrorist purposes. Electromagnetic attacks create potential danger even greater than the viruses. For instance, a GPS receiver can be jammed, or a running vehicle can be stopped instantly on the road by irradiating it from an electromagnetic weapon. The problem of telecommunication services for the affected population could not be solved satisfactorily in most cases. The capacity of ad-hoc telecommunication networks created to coordinate relief activities in the field is insufficient to satisfy also the communication needs of the population. A satisfactory solution of that problem could be offered only by the application of new technologies. Restoring telecommunication services to the affected population after the disaster strike is another problem of fundamental significance waiting for a practical solution. The new technology is capable to solve that problem. The 21st century will be that of integrated computer and broadband telecommunications. The Internet is becoming a worldwide standard, providing multimedia connectivity and compatibility at the protocol level for communications at headquarters level for all partners in humanitarian assistance. The Information Revolution has the potential to radically improve the efficiency of our field operations. Wireless communications work even under the worst conditions, including natural disasters and emergencies.” WGET considered the use of Internet in field operations, but found it not fully suitable for operational communications especially in the initial phase of an emergency, when real-time exchange of information is most essential. A limited number of access points to the Internet in disaster-prone areas were considered as an obstacle. Now we see a phenomenal growth of Internet services, and improvement of their quality, accompanied by new developments in radio technology. The Internet Protocol could solve difficulties due to incompatible communication equipments and protocols. However, to benefit fully from Internet in the field, a broadband wireless network is necessary. Global problems require global solutions that only high technology can offer. Telecommunications are crucial here. A physical wireless telecommunication infrastructure, accessible 24 hours a day from any place is a necessary element of disaster management on global scale, deserving highest priority. It would be a Global Disaster Relief Communication and Information Infrastructure, an integral part of an upgraded Global Information Infrastructure (GII) and future “3rd Generation” global systems. Indeed, the new technology offers wireless bandwidth-on-demand services everywhere on the globe, 24 hours a day, with guaranteed quality and reliability, and at a reasonable price. The capacity allows for transmission of thousands of computer files of 1Mb in size each, in a second. The ability to handle multiple channel rates, protocols and service priorities provides the flexibility to support a wide range of applications including the computer LAN interconnect, Internet and corporate intranets, multimedia communication, wireless backhaul, etc. offering access speeds thousands times faster than today’s stan-

Information and telecommunication systems for emergency management

17

dard analogue modems. Although optimized for two-way fixed-site terminals, the new LEO satellite technology is able to serve transportable and mobile terminals in open space, such as those for land-transport, and maritime and aviation applications. Except for user terminals the system may not need any earth-based structure to operate. With appropriate redundancy, the system is thus completely insensitive to disasters. New technologies open new vistas. It is only question of time when a LEO satellite global system will satisfy fully the needs of humanitarian assistance community. The system would consist of two parts interconnected via radio waves. One part, global, would be the “Internet-in-the-Sky” permanently accessible from any place; it could be shared with other applications. Another part would be a set of temporary, dedicated local networks created ad hoc in the field, following the local needs. New developments in signal processing make it possible to better use current capabilities of more traditional satellite technology. Such a future global “Internet-in-the-Sky” would assist greatly the disaster relief. It would enable new methods of coordinating the many faces of disaster assessment and response, and better use of limited resources available. Field manager would be able to exchange timely multimedia information with all those involved: medical doctors, specialized experts, databases, etc., from vehicle and from office, 24 hours a day, using his/her standard laptop computer and/or personal assistant. It would enable virtual “tele-presence” and participatory decisionmaking based on knowledge gathered from wherever in the world it might be located. Automatic generation and processing of distress/emergency alert signals would be possible, contributing to the safety and security in the field, requested for so long by so many. The distress signal would carry the geographic position and the fingerprint of calling person. The necessary information would automatically be distributed among the local manager, the headquarters, the nearest rescue team on duty, family, etc., and each would receive only what he/she needs and is authorized to receive. Thanks to its permanent presence and universal accessibility, the future system would also play an important role in timely warning and in effective disaster preparedness. Moreover, its enormous transmission capacity would enable rapid post-disaster recovery. Full telecommunication services could be offered to the population in hours or days after the disaster. No such universal global emergency communication/information infrastructure exists or has been attempted. Its implementation would thus involve development of new hardware and software, and financial investments. However, most of elements required are available, waiting to be integrated; others are under development.

18

Yu.I. Shokin and L.B. Chubarov

6 The Integrated Regional Information Networks (IRIN) The Integrated Regional Information Networks (IRIN) – part of the OCHA – are specialized information units dedicated to improving the international community’s response to humanitarian crises by providing timely, strategic and relevant information. IRIN staff, based in strategic locations in Africa and central Asia, draw information from a wide variety of sources, sift and verify it, and prepare reports on 46 countries in sub-Saharan Africa and eight in central Africa. IRIN provides daily news stories, special features, chronologies, interviews, weekly news digests and analytical reports. By September 2002, IRIN had produced over 6,300 individually researched and verified reports, double the number produced in the previous year. Some 100,000 people worldwide read the reports, delivered directly to the subscriber’s inbox daily, while the IRIN web site receives about 3.5 million hits per month. The steady increase in IRIN’s readership each year testifies to the value that the humanitarian community, constituting some 65 per cent of subscribers, places on IRIN. When crisis or disaster hits a country, communications are often one of the first casualties. Reliable sources dry up, government agencies collapse, media images do not give the full picture. Without constantly updated and accurate information on washed-out roads, bombed airfields, landmines, diseaseinfested water, epidemics, or civil unrest and outbreaks of violence, it is impossible to respond effectively. IRIN pioneered the use of e-mail and web technology to deliver and receive information to and from some of the most remote and underdeveloped places cheaply and efficiently. Its reporting focuses on strengthening universal access to timely, strategic and non-partisan information so as to enhance the capacity of the humanitarian community to understand, respond to and avert emergencies. IRIN services are provided free-of-charge and are available in a range of forms, including analytical reports, fact sheets, interviews, daily country updates and weekly summaries. These products are available through the IRIN web site athttp://www.irinnews.org and an e-mail distribution service that includes several management tools aimed at reducing information overload.

7 EPIX, the Emergency Preparedness Information Exchange, on the World Wide Web The Emergency Preparedness Information Exchange (EPIX), a computerbased emergency management information system operating on the worldwide Internet. Its primary purpose is to facilitate the regular exchange of ideas and information among Canadian and international public and private sector organizations and individuals about the prevention of, preparation for, recovery from and/or mitigation of risk associated with natural and technologically-

Information and telecommunication systems for emergency management

19

based hazards. The main objective of this work is to improve the timely exchange of information among those affected by and/or concerned with disasters and their consequences through the application of cost-effective, reliable and accessible communications and information (telematics) infrastructure. A key concern is to ensure that all concerned stakeholders can participate and remain in important decision-making and knowledge building processes regardless of physical location before, during and after disasters strike. The research focuses on testing contemporary and emerging telematics technologies (including new media) in order to evaluate and improve their use in disaster management activities. This work incorporates advanced telecommunications (especially space-based and terrestrial wireless technologies), development of applications over networks and facilitating technology transfer to the disaster management community through partnerships and collaborative networking initiatives. A major problem for emergency managers is forecasting bandwidth requirements before disasters occur. This problem not only applies to downloading data at disaster sites, but, conversely, to uploading data to emergency operations centres and support services, especially when there is a requirement for transmitting image data. Satellite telecommunications systems would include GEO, MEO and LEO technologies. The Telematics Research Lab (TRL) has recently acquired a GEO satellite antenna farm at an adjacent research park on the Simon Fraser University (SFU) Burnaby Mountain campus. These facilities will be upgraded and augmented with other ground station facilities. High speed fibre optic data service is also available at this site. On the main campus, the TRL is currently utilizing a 2 Mbps VSAT system provided by CRC that is interconnected to wireless and wireline/fibre networks through its co-location with the main campus computing services. Establishing a high speed space/terrestrial gateway at SFU to interconnect CA*net 3 (OC-48) to enable distribution of raw or processed remote sensing imagery, and other large scale computing applications. SFU is scheduled to house the western Canada end of CA*net 3 that will provide the highest speed networking available in Canada as a test bed for high capacity networking research. Establishing appropriate downlinking capabilities would enable a variety of space-based collaborative initiatives (including those on the new International Space Station) to be shared over CA*net 3 and the contemporary Internet to form a virtual lab facility. Wireless Access Protocol (WAP) allows the implementation of services that can deliver content to devices with a small computing and display capability footprint. Such services can be placed on top of conventional web and database servers to greatly extend the scalability and applicability of information resources.

20

Yu.I. Shokin and L.B. Chubarov

8 Project HPN2000 The supported project was aimed at porting Telematics emergency preparedness/disaster response systems over to PolyLAB systems, and then integrating them into a powerful resource, using a combination of HPC systems and advanced networking. Telematics disaster information resources will be ported to the PolyLAB HPC-class delivery system (Kasei), and the backup development system (Nirgal). Kasei will host web services via multiple host names. Kasei and Nirgal will supply disaster preparedness systems DNS service. A wide range of services now reside on Kasei. These include: ∗ Emergency Preparedness Information Exchange (EPIX); ∗ United Nations International Decade Natural Disaster Reduction Information Services; ∗ Emergency Preparedness Canada; ∗ Safe Guard (An emergency preparedness public awareness site); ∗ BC Provincial Emergency Program; ∗ BC Inter-Agency Emergency Preparedness Council; ∗ BC Emergency Social Service Association; ∗ Industry Canada Emergency Telecommunications; ∗ Hazard Net (Telematics led international demonstration project). A second site has also been developed with the NATO Civil Protection Committee, using much of the technology developed for the above, including XML/OpenMath encoding of disaster response information to support natural disaster mutual aid efforts among nations in Europe and North America. Kasei is an Enterprise 450 server, with 4 redundant processors, 2 redundant internal power supplies, optical ATM and 100 Mbps networking, and large amounts of disk space and memory. The Kasei system has been placed in a secure area, with UPS backup and emergency power provisions. The room in which it is located has a halon fire-prevention system. The range of integrated networking solutions implemented has received a large amount of national and international attention. A large, flexible, Integrated Network was built, augmenting the Virtual Emergency Management Information System (VEMIS). ATM switch was integrated with the satellite communication system. TCP/IP packets were then encoded into ATM packets by the switch, using the CLIP standard, transmitted via T1 to the satellite modem, and uplinked via the Anik E1 spacecraft to the Communications Research Centre in Ottawa. At standard operating levels (6 dB signal to noise ratios), there were no observed packet losses. Similar VSAT technology will be used by Telematics, PDG, and NASA to transfer data from the Haughton Crater, Devon Island, in Nunavut, back to NASA Ames Research Center The wireless network, based on 56 Kbps packet radio technology, was used for a variety of tests, fully integrating it with the information services located on Kasei. It provided network services in the Thunderbird IV tests, a

Information and telecommunication systems for emergency management

21

joint Provincial/Federal emergency exercise. Remote sites use the repeaters to broadcast signals back to internet gateways located at important locations, providing TCP/IP service throughout a large geographical region. Kasei tunneled multicast video data from the TRL, over the space-based link to the Federal Joint Alternate Site (JAS) in Cloverdale, and then radiated that data via 900 Mhz spread-spectrum to a vehicle located in a field nearby. Thus a high-performance satellite connection was made available to an area around the receiving Earth station. JAS was provided with a multiple-node wireless communications gateway system that can allow the local intranet to link into external internets via various communications technology. Each point of wireless transmission is firewalled, for security. Two buildings at JAS were connected with spreadspectrum technology, with one site providing 56Kbps access, and the other providing satellite communications via two satellite links. TCP analysis software was used to graphically check the packet flow through the links. SatCom links we controlled over the network by connecting Kasei into the control facilities on the satellite modem. Due to the advances in making the system TCP/IP compatible, and able to maintain connections to PDG OpenMath technology, the C3 load monitor will be used for link status monitoring. JavaStation booting over conventional networking, even when not on the same subnet as the boot server, was established. A DHCP relaying system was placed on Kasei to ensure that boot requests could be fed to Nirgal for processing. The JavaStations were booted over a range of VSAT satellite communication links. These tests went perfectly and were performed in conjunction with the Communications Research Centre at the federal government’s Team Canada booth at InterComm99 in Vancouver as well as during the Thunderbird IV emergency exercise. The JavaStations were then used as terminals to access the Virtual Institute Networking system, and the various disaster information systems on Kasei. JavaStations were also booted over the 900 Mhz spread-spectrum links. The resulting integrated network has thus demonstrated the ability to boot computers from high-performance servers located anywhere on the planet, and then provide access back to collaborative services, including computational services, on the remote network. It is this technology that will allow users to access a radio network from a portable computer, relaying data back to a vehicle, and from there back to a satellite communication system. From there, services on high performance networks and high performance computers can be accessed from any location within range of the satellite. The PolyMath Development Group (CECM) and the Telematics Research Lab (CPROST) at SFU are developing a high performance, next-generation, networking capacity within a high performance computing hardware context. Utilizing a mix of ATM networking, wireless networking, and various modes of satellite networking, it will facilitate the development of projects requiring both significant computing and networking resources. This project is aimed

22

Yu.I. Shokin and L.B. Chubarov

at the integration of the resources and development of status and control interfaces for the emergency information system VEMIS. One of the PDG’s SUN Microsystems Enterprise 450 is currently being integrated into the Telematics Lab. It will act as the communications hub between the local network, broadband-capable Very Small Aperture Terminal satellite links, VEMIS wireless emergency network and broadband CA*Net2 ATM network. It will mediate between the various technologies, using them efficiently, and providing rapid access to high-performance computing resources. Initially the facility will support deployment of crucial resources for disaster preparedness (information and collaboration), emergency alternative networking (in the event of loss of CA*Net2 connectivity), and centralized access to information and collaboration during a disaster. Remote sensing information will be integrated with use of HPC resources to predict damage, and warehouse important information. These resources will be tied together through status and configuration interfaces developed in Java using PolyMath technologies. In addition, existing Web-based administration and information services will be upgraded.

Information and telecommunication systems for emergency management

23

9 Virtual Emergency Management Information System (VEMIS) Virtual Emergency Management Information System (VEMIS) is an experimental alternative backbone networking system comprising both cabled and wireless components to provide robust, fault tolerant fixed and mobile communications to integrate organizational management systems before, during and after emergencies. During emergencies when terrestrial telecommunication networks are damaged or severely impaired, alternative and flexible networking arrangements become critically important to ensure ongoing and effective coordination of emergency response and relief efforts. The challenge of a Virtual Emergency Management Information System is to ensure that existing emergency information management and decision making support systems can be integrated through appropriate and robust networking infrastructure. VEMIS is being designed to take advantage of current and new communication and information technology, including services developed through EPIX. The initial prototype incorporates a new wireless Internet system operates at 56 Kbps with a capability to upgrade to a point-to-point system running at 1.5 Mbps and upwardly compatible systems as higher speed capabilities are reached. These facilities are interconnected to the Internet via landline and satellite-based telecommunications links and provide a seamless localto-international internetworking environment for development of specialized disaster management services. Results from this work are contributing to a broader understanding of how such technology can be used to develop and sustain Virtual Emergency Operations Centres that allow emergency managers to remain in the information loop during emergencies, especially when they are unable to travel to designated operations centres or when their physical presence is not required. VEMIS embodies several TCP/IP-based technologies, including traditional cabled ethernet systems, commercial RF equipment including satellite and spread spectrum. However, at the heart of VEMIS is a new 56 kbps packet amateur radio system designed by dB Microwave Inc.

References 1. IRIN - A UN initiative that saves lives and money. What is IRIN? http:// www.irinnews.org/aboutirin.asp 2. IRIN evaluation report, May 2003 http://www.irinnews.org/webspecials/ civilprotect/default.asp 3. Struzak R (2000) Evaluation of the OCHA (DRB) project on emergency telecommunications with and in the field. United Nations Office for the Coordination of Humanitarian Affairs (OCHA), United Nations, New York Geneva http://www.reliefweb.int/telecoms/evalu/OCHA 1 5.html

24

Yu.I. Shokin and L.B. Chubarov

4. Information and resources available for the provision of emergency telecommunications during relief operations http://www.reliefweb.int/telecoms/ 5. Emergency communications resources from the emergency preparedness information exchange (EPIX) at Simon Fraser University http://epix.hazard. net/topics/emcom/emcoms.html 6. Disaster mitigation and emergency preparedness at the centre for policy research in science and technology (CPROST) at Simon Fraser University http: //www.cprost.sfu.ca/research.html 7. Building emergency lanes along the new information highways and skyways http://www.cprost.sfu.ca/trl proj.html 8. Wood M (1996) Global disaster communications. Part 1. First Edition, G4HLZ, Disaster Relief Communications Foundation http://www.reliefweb. int/library/dc1/dcc1.html 9. Anderson P, Jorgenson L, Braham S Project: HPN2000 - High performance next-generation networking environment for research into high performance computing/networking. Final report. Simon Fraser University 10. (2002) United Nations Office for the Coordination of Humanitarian Affairs Symposium on Best Practices in Humanitarian Information Exchange. Palais des Nations Geneva, Switzerland

RESOURCES ON EMERGENCY COMMUNICATIONS Organizations and Special Projects Intergovernmental Conference on Emergency Telecommunications (ICET 98) http://www.itu.int/newsarchive/projects/ICET Working Group On Emergency Telecommunications (WGET) http://www. reliefweb.int/telecoms/intro/wget.html Global Disaster Information Network http://www.state.gov/www/issues/ relief/july.html Association of Public Safety Communications Officials - Canada http://www. apco.ca Canadian Coast Guard http://www.ccg-gcc.gc.ca/mcts-sctm/main.htm Communications Security Establishment (Canada) http://www.cse.dnd.ca Industry Canada – Emergency Telecommunications Website http:// spectrum.ic.gc.ca/urgent/index.html Emergency Communications for Southwest British Columbia http://www. ecomm.bc.ca The Disaster Relief Communications Foundation http://ourworld. compuserve.com/homepages/mark a wood The Emergency Information Infrastructure Partnership Forum http://www. emforum.org/index.html National Communications System (USA) http://www.ncs.gov The Public Safety Wireless Advisory Committee http://pswac.ntia.doc. gov

Information and telecommunication systems for emergency management

25

Federal Communications Commission (FCC) http://www.fcc.gov National Telecommunications and Information Administration (NTIA) http: //www.ntia.doc.gov The National Public Safety Telecommunications Council (NPSTC) http:// rmlectc.dri.du.edu/npstc Emergency Information Networks California Emergency Digital Information Service (EDIS) http://edis.oes. ca.gov Emergency Managers Weather Information Network http://iwin.nws.noaa. gov/emwin/index.htm FEMA News Desk http://www.fema.gov/fema/news.htm Emergency Notification Systems. Wireless Systems Intelligent Wireless Solutions Corporation iws.htm

http://www.inwireless.com/

On-line Discussion Groups Networks in Emergency Management gopher://hoshi.cic.sfu.ca:5555/ 11/epix/topics/emcom/NEM Reference Materials Information Centres Disaster Communications Law and Policy http://www.law.indiana.edu/ webinit/disaster On-line Publications Computing and Communications in the Extreme: Research for Crisis Management and Other Applications http://www.nap.edu/readingroom/books/ extreme Disaster Communications Manual http://www.reliefweb.int/library/ dc1/dcc1.html Guidelines for the Design and Construction of Mobile Command Posts and Similar Emergency Response Vehicles http://www.epc-pcc.gc.ca/pub/ manuals/en mobile.html Incident Command System Forms http://www.dot.gov/dotinfo/uscg/hq/ g-m/nmc/response/forms/Default.htm

26

Yu.I. Shokin and L.B. Chubarov

The Intelligent City And Emergency Management In The 21st Century http://webwrite.com/cespub2.html Bibliographies Emergency Telecommunications compiled by UN-DHA http://www.unog. ch/freq/biblio.html Computer Applications In Disaster/Emergency Planning http://epix. hazard.net/topics/emcom/nceer.940303 Computer Use In Emergency or Disaster Management http://epix.hazard. net/topics/emcom/nceer.940201 Disaster Communications http://epix.hazard.net/topics/emcom/nceer. 950802 Earthquakes and Telecommunications #1 (Bibliography) gopher://hoshi. cic.sfu.ca:5555/00/epix/topics/emcom/earthquake.telecoms Earthquakes and Telecommunications #2 (Bibliography) gopher://hoshi. cic.sfu.ca:5555/00/epix/topics/emcom/earthquake.telecomms2 Expert Systems in Emergency Management http://epix.hazard.net/ topics/emcom/nceer.940206 GIS in Emergency/Disaster Communications http://epix.hazard.net/ topics/emcom/nceer.950106 Mass Media and Natural Disasters http://epix.hazard.net/topics/ emcom/nceer.941017 Post Disaster Communications (Bibliography) gopher://hoshi.cic.sfu. ca:5555/00/epix/topics/emcom/post.disaster.comms.txt Telecommunications Equipment: Mitigation of Seismic Damage http://epix. hazard.net/topics/emcom/nceer.941023 Amateur Radio AMSAT Home Page http://www.amsat.org/amsat/AmsatHome.html The American Radio Relay League http://www.arrl.org Radio Amateurs of Canada http://www.rac.ca The Radio Society of Great Britain http://www.rsgb.org Radio Amateurs’ Emergency Network http://www.sgi.leeds.ac.uk/ raynet/index.htm Radio Amateur Civil Emergency Service http://www.qsl.net/races San Diego County RACES http://www.races.sandiego.ca.gov Russian Amateur Radio Emergency Service http://rw3ah.access.ru Interoperability and General Telecommunications Information Sources International Telecommunications Union http://www.itu.ch Information and Communication Technologies for Sustainable Development http://www.bvx.ca/ict

Information and telecommunication systems for emergency management

27

International Multimedia Teleconferencing Consortium (IMTC) http://www. imtc.org Internet Engineering Task Force (IDE) http://www.ietf.org Telecom information resources http://china.si.umich.edu/telecom/ telecom-info.html

High Performance Computing in Engineering and Science M. Resch High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, Allmandring 30, 70550, Stuttgart, Germany [email protected]

Summary. High Performance Computing (HPC) has left the realm of large laboratories and centers and has become a central part in simulation in engineering and science. We summarize the basic problems and describe the state of the art. A concept for an integrated approach is presented. This covers hardware and software aspects. Examples are presented to show the potential of an HPC workbench for engineering and science.

1 Introduction High Performance Computing (HPC) simulation has long become a tool for scientific discovery and engineering development. In scientific research simulation is a third way of getting insight – besides the classical methods of theoretical and experimental work. The advantages of simulation over these two traditional methods are manifold: simulations can easily be reproduced and repeated anytime and anywhere – given the necessary computational resources; simulation experiments can easily be modified with an infinite number of variations at the scientists hand; all kinds of dangerous experiments can be avoided by using computer simulations – the best example being the US Accelerated Strategic Computing Initiative (ASCI) project [1] which aims at replacing atomic bomb tests by advanced simulation on supercomputers. In engineering simulation has become a central part in the life cycle of commercial products. Simulation is introduced already in the design phase and lasts until the customer support phase [2]. The key issue in industry is a reduction of cost which can be achieved by avoiding lengthy, tedious and expensive experiments on the one hand. On the other hand parameter variations can help to focus on the most promising design alternatives very early in the product development process. The worst scenarios can be sorted out early on. The growing importance and changing role of simulation both in science and in the industrial production process lead to increased requirements both in terms of performance and in terms of usability of HPC systems. These

30

M. Resch

issues are addressed in this paper. The structure of the paper is as follows: section 2 analyzes the key issues for HPC in science and engineering both with respect to hardware and software. From this we derive a concept for an integrated simulation workbench which is described in section 3 An example for an integrated simulation based on such a workbench is described in more detail in section 4.

2 Key Issues for HPC Simulation There is no exact definition of a high performance computer. Commonly it is assumed that the most powerful – and expensive – systems at any point in time are high performance computers or supercomputers. In order to be ahead of the competition supercomputers have always made use of innovative concepts both in hardware and software. Many of these concepts were later integrated into standard products. Today there is an ongoing discussion about which way to go in hardware and software development. 2.1 Hardware Issues When Seymour Cray founded his first company he did so in order to have the freedom of developing the fastest supercomputer without having to consider the general commercially driven market for computers. Until then a computer was by definition also a supercomputer - and only a few research and government institutions were able to afford such systems. With the reduction in prices in the 60s a much wider market was opened for manufacturers. Since then hardware development is mainly driven by the mass market. Cray was able to build specially designed supercomputers because of the cold war thinking of the time. Supercomputers were needed to build better weapons. Research institutions and universities benefited from this by being able to acquire supercomputers for the price that was to some extent subsidized by governmental organizations concerned with weapon development and security. With the fall of the iron curtain in 1990 and the end of the cold war this unique and somewhat pathological market situation changed dramatically. US governmental institutions changed their funding strategy and turned away from vector supercomputing. It became common wisdom that a thousand commodity parts would do better than one highly sophisticated processor and besides that would be cheaper. This thinking has had a major impact on supercomputing for at least one decade. Parallel Computing In the early 90s parallel computing - a concept that was investigated as early as the 60s - finally achieved a breakthrough in the market. Increased speed

High Performance Computing in Engineering and Science

31

was henceforth to be achieved by an increase in number of CPUs and not by an increase in clock speed only. While the clock rates of high speed processors were increasing rather slowly, offering an increase in performance in the range of 2-4 over a period of 2-4 years, massively parallel systems promised a thousand fold increase in speed with the potential to go to 10000 soon. While the Japanese continued their vector programs - the VPP-line from Fujitsu and the SX-line from NEC - the US basically gave up on the idea and focussed on parallelism exclusively [1]. Initially massively parallel systems seemed a feasible approach. Up to 64000 CPUs in a single system were meant to be the solution to tackle grand challenge problems. From a hardware perspective, however, the problems of integration were too big and the costs too high to follow such an approach. Systems like these failed to become a long term market success. A number of companies disappeared or changed their focus. Intel - although having built one of the best systems of that era, the Intel Paragon - gave up on supercomputing and focussed on its core business of building processors. Among the few survivors of this first phase of commercial parallelism was Cray which had the necessary technical skills to build systems like the T3D and the T3E [3]. Putting together highly sophisticated systems built from a large number of single components had always been part of Cray’s success. Parallelism - once learned - soon was mastered by the market leader. And Cray fully understood that moderate parallelism (in the range of 512 to 1000 CPUs) was a sweet spot in terms of performance versus ease of use. Cray was thus the first to deliver 1 TFLOP/s of sustained performance for a real application already in 1999 on a T3E. Clusters With the integration of so many components in a single system beyond the reach of technical engineers an old concept again attracted interest. Clusters of workstations had been on the market already in the 80s. The purpose of the concept at this time was increased flexibility and better usage of resources of engineers working in groups. Throughput, failover and cooperation were the key words at that time. The concept became attractive for a wider community when massively parallel systems became too expensive. In the mid 90s an idea had emerged that was quickly picked up both for hardware and software. Commodity parts Of The Shelf (COTS) was driven by financial considerations rather than by technical ones. Standard components were sold in millions of copies. Prices for these components were therefore low compared to specialized ones. Development costs were easily absorbed by the huge market. Cheap and fast microprocessors were therefore expected to replace the expensive vector machines - that were seen as the dinosaurs in the supercomputing landscape, doomed to die out inevitably within the next years. Clusters

32

M. Resch

of microprocessor-based workstations were seen as the future of supercomputing. Claiming that workstations are typically idle for 95% of the time and that high speed networks would allow to cluster them to form a supercomputing resource [4] clusters were seen as a replacement even for highly integrated massively parallel systems. An internal NASA report in the early 90s found that 90 % of all in house codes would be able to exploit the potential of such cheap clusters. In the late 90s a number of specifically interesting projects aimed at exploiting this potential for supercomputing. The most well known ones were perhaps the US Cplant project [5] and the Japanese Real World Computing Project (RWCP) [6]. Cplant was built from Compaq/DEC α-processors and made use of a specially designed interconnect network. RWCP was based on Intel technology [7] and used its own network [8]. Both projects had a large impact on the community but did not turn into real products. However, the experience gained in these projects gave rise to a new type of cluster technology. Both Intel and AMD grasped the potential of the simulation market and provided products tailored for numerically intensive computing. As a result today most clusters are based on nodes of dual-processor boards - either equipped with Intel Xeon processors or with AMD Atlon processor [9]. These systems are targeting applications that require only 32 bit precision. With the introduction of the AMD Opteron in 2003 64 bit processing has become possible at a competitive price level [10]. Interconnection Networks One of the most important problems for clusters was and is the interconnect. The standard network Ethernet was for a long time too slow to make clusters interesting for more than just embarrassingly parallel problems. Two technical concepts set out to overcome the lack of network performance for clusters Myrinet [11, 12] and Quadrics [13]. Both achieved that goal and became de facto standards for cluster networking. However, there still is a limitation for the bandwidth that stems from the usage of standard components. Using standard PC-processor boards the bandwidth is limited by the usage of the PCI bus. This will only be overcome when PCI-Express will be available. With respect to latency it is interesting to note that compared to traditionally designed supercomputers both Myrinet and Quadrics are much better. Both networks are more expensive than the standard Ethernet network. That is why high speed networking still is a matter of cost. This will potentially change once Gigabit Ethernet and Infiniband [14] become widely available. Clusters of SMPs The limitations of clusters are well known [9]. Network connectivity is poor compared to traditional supercomputers. Having thousands of parts integrated into a single system makes management and programming very dif-

High Performance Computing in Engineering and Science

33

ficult. On the other hand shared memory systems (SMPs) are nice and easy to program but do not scale beyond 8 to 16 processors. A way of compromising are clusters of shared memory processors. Using fat nodes the requirements for network interconnectivity are dramatically reduced. A typical configuration may have 8 nodes with 64 CPUs each. This is a system with a total of 512 processors. Network interconnectivity, however, requires an 8x8 switch only. Most ASCI projects were built on this concept. The size of nodes is varying between 4 and 512. In any case it was possible to substantially increase the number of CPUs without having to build complex interconnects. As with all compromises someone had to pay the price for giving up on complex networks. In the case of clusters of SMPs it were those end users that required usage of all CPUs. While smaller cases were easily run on a single SMP box without ever relying on the network speed, large problems required network performance - and most of them never saw any. The Earth Simulator While the US - especially the ASCI project [1] - was following the path of clustering microprocessors, Japan kept developing vector processors. Based on an estimated requirement for simulating weather, climate and earth quake phenomena - which are extremely important for Japan - the government set up the Earth Simulator project [15] to develop the world’s fastest supercomputer with a peak performance of 40 TF/s. The systems is based on three main principals: Processor-Memory: The only way to achieve high sustained performance is to use a vector processor with a high bandwidth memory subsystem. Fast Nodes: In order to avoid degradation for memory access when the number of processors increases the number of processors per nodes should be small. For the Earth Simulator it is eight processors per node. In addition a special memory subsystem was chosen that can sustain a high memory bandwidth for all eight processors simultaneously. Network Interconnect: High bandwidth and low latency interconnects are indispensable for large systems. For the Earth Simulator a special 640x640 switch was designed. The system was installed in 2002. It immediately took the number one position on the Top500 list [16] with a Linpack performance of more than 35 TF/s. This was seven times faster than the second fastest system on the list in 2002. In 2003 it was still 3 times as fast as the second ranking system. What is even more impressive is that already a few months after installation the Earth Simulator was able to show a sustained performance in the range of 20 TF/s for real applications.

34

M. Resch

New US Projects After having turned away from vector processors the US strategy was focussed on massively parallel systems. In the ASCI project [1] a number of systems were build - all with thousands of processors. Most of them were loosely coupled systems with slow networks. The advent of the Earth Simulator seems to turn the tide. A number of projects were initiated to catch up - at least in terms of Top500 Linpack performance. The most interesting ones are driven again by Cray technology. Oak Ridge National Laboratories have started to build a large vector facility based on Cray’s X1-architecture [17, 18]. The system is supposed to compete with the Earth Simulator in sustained performance. At the same time Sandia National Laboratories have initiated a project called ”Red Storm” [19]. The architecture of Red Storm is following the excellent concepts of the former Cray T3E architecture. It will use standard AMD processor technology but will enhance these standard components by a specially designed interconnect. When finished, Red Storm will be able to deliver highest performance at a price that will be able to compete with commodity products. 2.2 Software Issues The simulation process can be split into a number of individual steps. These are not always clearly distinct from each other. Furthermore, the traditional triad of pre-processing, processing and post-processing has to give way to more interactive approaches. The main steps through which one has to go in simulation are, however, still the same: Code Preparation In preparing code for a simulation the main issue - besides portability - is optimization for the chosen platform. This includes both the improvement of sustained performance on a single processor and the optimization of communication patterns for a given network and its topology. Optimization can only be done on the production system - or a similar smaller system. Code preparation and optimization are supported by standard parallel programming models [20, 21, 22]. These models were designed for shared memory systems (OpenMP) and distributed memory systems (MPI). Shared memory systems are available at the desktop level and the OpenMP model is a good choice for this type of systems allowing for a reasonable level of optimization within a reasonable amount of time. For distributed memory systems optimization is much more difficult. MPI was not designed mainly for highest performance but for completeness. With an increasing number of processors - which goes up to 10000 in a single system - the programming and communication overhead of MPI becomes so significant that hardly any applications are known today that are able to exploit large distributed memory systems.

High Performance Computing in Engineering and Science

35

And so, developing new parallel programming models is one of the challenges for the future. Input Preparation Input data and input files have to be generated. These have to be prepared and well documented in order to be able to understand and interpret the results. With growing available main memory of supercomputers input data sets grow in size. It becomes increasingly difficult to prepare them on smaller systems. Therefore strategies for pre-processing will have to be changed in the future. Mesh creation will have to be done in parallel. Starting from a small basic set of information the application code will have to inflate all input information in parallel. Computing The goal of simulation and computing is not to achieve high performance but to get the required answer in acceptable time. The sustained performance is only relevant in as much as it defines the size or complexity of a problem that can be solved in the same time period. In addition to the turn-around time for a single user, the overall optimum usage of the system is a goal that is important for the operator of supercomputers. Such optimum usage makes itself shown in the prize for the resource and has become an economic issue. Today there are no techniques that could guarantee optimum usage of systems with thousands of processors. Simulation Control Control over the running simulation is an important issue when systems get larger and more expensive. It can either be done directly by steering the simulation or by analyzing intermediate results that are accessible already during the simulation. Control is important to reduce the total time to achieve the desired result and to make more efficient use of expensive compute resources. Furthermore, as we will see, increasing sizes of data sets make it necessary to change from post-processing to an interactive understanding of the results of the simulation process - something we might call co-processing. Analysis With the growing size of main memories more and more fields of simulation move from simple models to more complex ones and from two-dimensional problems to three-dimensional ones. This increase in complexity and number of dimensions does not only result in larger output files. It also makes it more and more difficult for any user to understand the complex phenomena that

36

M. Resch

are hidden in the results. Standard visualization therefore starts to reach its limits in the same way as simple plotting of curves did 15 years ago. More complex techniques to get insight are required. Archiving With increasing compute speed redoing a simulation may be cheaper than archiving data. For this only input files have to be stored. A balance of computing costs and archiving costs has to be calculated. However, for a variety of fields of applications archives have a growing importance both for scientific and legal reasons. A traditional simulation work-flow goes through all these steps sequentially or in an iteration loop. The flow of data that accompanies this work-flow is growing. Therefore, it becomes more and more important to bring the human into the loop especially when large systems are used for a long time. Interactive control and/or steering of the simulation can help to avoid costs both in terms of money and time. Software has to support this change of paradigm. 2.3 GRID Computing The Grid is generally seen as a concept for ‘coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations’ [23]. The original idea came from scientists, who were mainly interested in the scientific solution of their problem. Their jobs can be executed on any machine, respectively on any set of machines, to which they have access. Like in Metacomputing, which was a popular concept in the mid-90s [24], the idea of distributing jobs onto several machines is an important part of the Grid concept. The need for doing so mainly comes from applications, which have a high demand on computational resources [25, 26], increased throughput on the machines and reduced turn-around times. For long the Grid was expected to solve all these problems. Distributing a parallel job onto several machines imposes, however, many problems on the user and the application [27, 28, 29, 30]. Most of the problems stem from the fact, that the Grid is heterogeneous in several senses. Problems arise with different data representations on different machines, which requires data conversion either in the communication layer or in the application itself. Various processor speeds, differences in the available memory and the usage of shared resources require the application to have a smart initial load distribution as well as dynamic load balancing. The differences in the communication characteristics for data exchange inside a system and between processes located on systems at different sites require special programming techniques for hiding the wide area latency and dealing with the low bandwidth [31, 32, 33, 34].

High Performance Computing in Engineering and Science

37

Another level of heterogeneity is introduced by the different methods of access to the different machines in a Grid, e. g. ssh [35], UNICORE [36] or Globus GSI [23]. A number of tools were developed or extended to support simulation in GRID environments [37, 38, 39, 40, 41, 42]. But although a number of GRID environments were set up [23, 43, 44] and a number of projects have shown the feasibility of the GRID concept even in an industrial environment [43, 45, 46] the GRID is rather a concept for new infrastructure than for supercomputing simulation.

3 A Simulation Workbench Approach The concept of a simulation workbench was developed at HLRS [47] and follows the work-flow of a typical simulation in science and engineering. It is schematically described in figure 1.

Fig. 1. Workbench concept for simulation in science and engineering

3.1 File System At the centre of the concept is a file system. During the simulation process the main activity is manipulation of data. Input data are created, are modified in the simulation process and finally result data have to be visualized for the

38

M. Resch

end user. Logically, therefore, data are at the heart of the simulation workflow. On the other hand technical limitations require a central file system for a supercomputer with a main memory in the range of several Terabytes because First, hardly any user can afford to store the result files at her local disk system. Most applications are time-dependent and three-dimensional. When fully using the available memory such simulations typically create files that are of the order of 10 times the main memory size. Hence we have to handle a file size of 10-100 TB. Even with currently available cheap RAID technologies it is difficult for any single user to locally store this amount of data. Second, communication speed of wide area networks does not increase at the same rate as does compute speed. While the size of files will be in the range of 10-100 TB the sustained bandwidth for a wide area network connection is unlikely to exceed 1 Gbit/s for a single user in the near future. Consequently a file transfer would require 80000 to 800000 seconds - or one to 10 days. In order to avoid data transfer pre- and post-processing have to be fully integrated into the concept. This requires a file system with the following two main characteristics: Heterogeneity: The file system has to be able to support multiple platforms. Pre- and post-processing systems will typically be cheaper standard computer systems while the supercomputer will be an expensive and very special system. With the growing performance of PCs for visualization post-processing is moving away from the traditional sgi-systems - at least for the low end. With the advent of 64bit processors on the PC market limitations in memory size for PCs have vanished making pre-processing for even large cases easy to do on such cheap systems. Although Windows shows some potential for pre- and post-processing, Linux dominates this market segment for the PC. Hence, although there are a variety of options for such heterogeneous file systems, the most interesting open software today seems to be the Linux based Lustre project [48]. High Speed I/O: High speed for I/O is not simply described in number of bytes that can be transferred during read or write operations. Although this is of importance it is increasingly becoming more important to achieve reasonable metadata performance. Given the complex concept above, with an integration of a collection of hardware systems, software tools and users, management and speed of access to randomly distributed data is a key to success. 3.2 Interconnect To integrate the supercomputer so tightly with pre- and post-processing systems requires a network solution with similar features as those of the file

High Performance Computing in Engineering and Science

39

system. Heterogeneity is a must while highest performance should not be compromised. Unless the supercomputer is chosen to be a cluster there seems to be no way to fully comply with these requirements. Networks like Myrinet [11] and Quadrics [13] provide a certain level of heterogeneity but still require a PCI bus for the systems to be connected to the network. Vendor specific options focus on highest communication performance ignoring the potential heterogeneity of an integrated hardware environment. For these latter systems Gigabit Ethernet or Infiniband might be a work around solution.

4 Integrated Simulation With integrated simulation we describe a concept where the scientist or engineer is part of the simulation loop and can interactively modify whatever parameters might influence the simulation. This is achieved by integrating all steps of the work-flow into a single simulation environment such as e.g. COVISE [49] or any other tool. The key is that the user does not have to move back and forth between pre-processing, simulation and post-processing but at any time is in control of the work-flow of the simulation process. Based on the workbench concept a new approach for simulation in science and engineering is possible. In the following we give two examples for such simulations and show how these can be supported by our workbench approach. 4.1 Blood Flow Simulation Here we present an example from the field of applied medical simulations [50, 51, 52] for the usage of an integrated workbench. About 2% of the elderly population suffer from a so called abdominal aorta aneurysm (AAA). An aneurysm is a dilatation of a blood vessel. Once initiated the dilatation may continue until the aneurysm rips and springs a leak or ruptures. When such a rupture occurs the chances of dying from internal bleeding are extremely high. Although new surgical methods have been able to bring down the mortality rate to currently about 20% there is still a lot of room for improvement. One way to treat such an AAA is to implant stent grafts in order to channel the blood at its way through the aneurysm using endovascular methods. In this procedure the surgeon delivers the stent graft via a catheter inside the dilatation in which it unfolds, taking away the pressure from the weakened aortic wall. This method has been proven to work well and patients typically can be released as early as 24 hours after receiving surgical treatment. However, complications may occur later, which can include leakages and migration of the stent or even its elemental breakdown. The causes for these problems are not exactly known. In order to get a better understanding of the behaviour of the complex mechanical systems simulation is a feasible

40

M. Resch

approach [53, 54]. This requires adequate data gathering for the individual patient, feasible mathematical and numerical models, and substantial compute performance.

GRID

Fig. 2. Integrated simulation of blood flow in large arteries

Following the work-flow for a simulation of an AAA we encounter the following steps: Data Gathering: In the case of a patient data gathering is done using Computer Tomography (CT) or Magnetic Resonance Imaging (MRI) scanners. In both cases hundreds or even thousands of two-dimensional images are created that when put together can give a three-dimensional representation of the scanned human body. These two-dimensional images are the base for any further investigation - just like a CAD file is the base for any further simulation of a car or plane in an engineering application. 3D-Reconstruction: From the two-dimensional images we first have to extract the artery of interest. Both with CT and MRI the artery can be made visible. However, in both cases it is difficult to find clear boundaries of the arterial geometry. One reason is that the artery moves while the scanning of the patient goes on. This is due to the heart beat and breathing of the patient - both of which can not be stopped during the scanning process. Furthermore it is extremely difficult to assemble two-dimensional images into a three-dimensional geometry. Methods for this do exist but require a lot of computational effort.

High Performance Computing in Engineering and Science

41

In the case of a CAD representation the problems are similar. Here we have the problem of disconnected surfaces created from patches. For a CAD design it is not absolutely required that surfaces are closed and edges meet exactly. Nor is it required that corners of neighbouring patches do exactly fall onto each other such that they do have the same physical coordinates. It is rather normal that gaps and discontinuities exist which have to be repaired in order to achieve a closed three-dimensional representation. Mesh Generation: Mesh generation is a problem in itself. With the growing size of main memory on massively parallel systems the creation of a mesh ceases to be a static process. So far a typical pre-processing system requires about 10% of the memory of the production system in order to be able to create a full mesh for a simulation. Massively parallel systems have up to 10000 processors and easily provide main memory in the range of 10-50 Terabytes. A pre-processing system for such a machine would require 1-5 Terabytes and would be a supercomputer in itself. This is no longer feasible. Mesh generation has to be turned into a dynamic process. Only a basic small mesh can be created on the pre-processing system. The fully fleshed mesh does then have to be created in parallel distributed across the overall system. Mesh refinement will no longer only be a way to respond to the numerical results of the simulation but a tool to dynamically create a feasible initial mesh for large systems. Simulation: During these three steps data have to be moved close to the supercomputer. When scanning a patient it is feasible to transfer only a filtered part of the two-dimensional pictures to the local file system - also hospitals will not have extreme bandwidths at their fingertips in the future. All further pre-processing can be done on the local file system. After this is finished the actual simulation works on the same data. Results of the simulation - both intermediate and final - are again stored on the same file system and are available for visualization and understanding already during the simulation process. Visualization: Visualization of complex data has grown out of understanding numbers or two-dimensional pictures. Three-dimensional representations have become a must to get an intuitive understanding of the processes involved. However, even this is not enough when time-dependent phenomena are involved. Dynamic visualization in virtual reality environments is required to fully grasp time-dependent three-dimensional phenomena. This will not require expensive settings like ”caves” or ”holobenches” but will be possible on special monitors at the engineer’s desk. Nevertheless, any kind of post-processing system has to be adequately connected to the simulation resources. 4.2 Coupled Simulation The blood flow simulation described above may require a coupling of computational fluid dynamics and structural mechanics in order to accurately

42

M. Resch

describe the behaviour of the artery. Such a multi-physics approach is getting more important also in engineering [54, 55, 56]. Coupling typically involves various software modules. Each module is capable of solving one specific type of physical problem. In order to couple these modules most approaches currently use a weak coupling. After one iteration step of module one data are exchanged with module two. Based on these data, module two computes one iteration step and returns its data to module one. The exchange described here can be a rather simple one via files. It can be more complex via messages that are explicitly exchanged between both modules. For the exchange based on files a common fast file system as described above is mandatory. The situation gets more difficult as we use special hardware for each of the modules. In a fluid-structure interaction simulation of the German aerospace industry the fluid part was perfectly optimized for massively parallel systems while the optimization part for the structural layout was best suited for vector supercomputers [56]. As a result it was best to run the CFD part on a massively parallel system while the structural simulation was done on a vector system. In such a case we introduce heterogeneity by nature of the problem. Exchanging data based on files we need a file system that supports heterogeneity - otherwise the loss of performance due to slow writing and reading is unacceptably high. To improve the situation a direct exchange of data between individual modules based on some standard communication protocol is much better. Since the message passing interface (MPI) [20, 21] has been established as a standard a number of projects have aimed at implementing it for heterogeneous platforms. Such libraries take care of the communication part in a workbench as described above [32, 33, 34]. Some very good results for such an approach were achieved in the European project DAMIEN [43]. Similar results were achieved in a Japanese project [57]. In the latter case a fluid-structure coupling was done on two supercomputers that were about 100 kilometers apart. Such simulations are only possible if the resources are integrated by software into a workbench as described above.

5 Conclusion High Performance Computing has become a standard tool for simulation. This provides both science and engineering with a growing number of options in simulation and modelling. However, with the growing speed of computers, the growing number of processors involved and the growing size of main memories for large systems a number of challenges come up. Cheap standard components are put together to form massively parallel systems with an ever growing peak performance. These systems are error prone and require excellent system software. This does not only mean operating systems and management tools. Users require an integrated software

High Performance Computing in Engineering and Science

43

approach that hides away the complexity of the system at least at the level of management. A must for such configurations is a file system capable of both high bandwidth and support for heterogeneous platforms. Based on such file systems the complexity of todays computer landscape can partly be hidden without too much loss of performance. All this converges towards a workbench approach that is data-driven. Such an approach reflects the work-flow in science and engineering. Although a lot of work towards such a workbench has already been done, there do remain a number of research issues. File systems are faced with both hardware and software bottlenecks that have to be overcome. On the other hand access of the user to the system should become more seamless but at the same time more secure. Especially in an industrial setting the balance between ease of use and confidentiality will be very hard to keep. The key to success for all these approaches is to make the scientist and the engineer become part of the process and get directly involved. Interactive usage and a responsive attitude become mandatory to increase productivity both in terms of quality of results and quantity of simulations performed. The workbench concept as will be set up in the next two years at the High Performance Computing Center Stuttgart (HLRS) is aiming at achieving these goals. If we succeed to put the human in the loop the new large supercomputers will again become useful tools in the hand of scientists and engineers in the years to come.

References 1. Accelerated Strategic Computing Initiative (ASCI) http://www.llnl.gov/ asci/ 2. Ray U (2003) The EDM Strategy of Mercedes Car Group Development, DaimlerChrysler Electronic Datamanagement Forum 2003 - Global Engineering, B¨ oblingen, Germany 3. Resch M, B¨ onisch T, Berger H (1997) Performance of MPI on a Cray T3E. In: Third European CRAY-SGI MPP Workshop, Paris, France 4. Turcotte LH (1993) A Survey of Software Environments for Exploiting Networked Computing Resources. Report MSU-EIRS-ERC-93-2. NSF Engineering Research Center for Computational Field Simulation, Mississippi State University, Starkville, MS 5. Riesen R, Brightwell R, Fisk LA, Hudson T, Otto J (1999) Cplant. In: Proceedings of the Second Extreme Linux Workshop, Monterey, California 6. Real World Computing Project http://www.rwcp.or.jp/home-E.html 7. Sato M, Tanaka Y, Matsuda M, Kubota K (1998) COMPaS: A Pentium Pro PC-based SMP Cluster. In: Proceeedings of the 1998 RWC Symposium (RWC Technical report, TR-98001) 8. Nishimura S, Kudoh T, Nishi H, Harasawa K, Matsudaira N, Akutsu S, Tasyo K, Amano H (1999) A network switch using otpical interconnection for high performance parallel computing using PCs. In: Proceedings of the Sixth International Conference on Parallel Interconnects, Anchorage

44

M. Resch

9. Resch M (2002) Clusters in Grids: Power plants for CFD, In: Wilders P, Ecer A, Periaux J, Satofuka N, Fox P. (eds) Parallel Computational Fluid Dynamics. Practice and Theory. Elsevier, North-Holland 10. Joseph E, Kaumann N, Willard CG(2003) The AMD Opteron Processor: A New Alternative for Technical Computing, White Paper, IDC, November 11. Myrinet http://www.myri.com/myrinet/overview/index.html 12. Prylli L, Tourancheau B, Westrelin R (1999) The Design for a High Performance MPI Implementation on the Myrinet Network, In: Dongarra J et al. (eds). Recent Advances in Parallel Virtual Machine and Message Passing Interface. Proceedings of the 6th European PVM/MPI Users’ Group Meeting, EuroPVM/MPI’99, LNCS 1697, Barcelona, Spain 13. Quadrics http://www.quadrics.com/ 14. Infiniband http://www.infinibandta.org/home 15. The Earth Simulator Project http://www.es.jamstec.go.jp/ 16. TOP 500 list http://www.top500.org 17. Oak Ridge National Laboratories http://www.csm.ornl.gov/PR/OR02-25-03. html 18. Fahey M, White J (2003) DOE Ultrascale Evaluation Plan of the Cray X1, Cray User Group Meeting 2003, Columbus, Ohio, USA 19. Koblenz B (2003) Cray Red Storm, Cray User Group Meeting 2003, Columbus, Ohio, USA 20. (1995) MPI Forum MPI: A Message-Passing Interface Standard. Document for a Standard Message-Passing Interface, University of Tennessee 21. (1997) MPI Forum MPI2: Extensions to the Message-Passing Interface Standard. Document for a Standard Message-Passing Interface, University of Tennessee 22. OpenMP Standard Definition http://www.openmp.org/ 23. Foster I, Kesselmann C, Tuecke S (2001) Int J Supercomp Appl 15(3) 24. Catlett C, Smarr L (1992) Metacomputing, Comm ACM 35(6):44–52 25. Allen G, Dramlitsch T, Foster I, Karonis N.T, Ripeanu M, Seidel E, Toonen B (2001) Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus. In: Supercomputing 2001, Denver, USA 26. Gabriel E, Lange M, R¨ uhle R (2001) Direct Numerical Simulation of Turbulent Reactive Flows in a Metacomputing Environment. In: Proceedings of the 2001 ICPP Workshops 27. Barberou N, Garbey M, Hess M, Resch M, Rossi T, Toivanen J, TromeurDervout D (2003) J Parall Distr Comp 63(5):564–577 28. Barberou N, Garbey M, Hess M, Resch M, Toivanen J, Rossi T, TromeurDervout D (2002) Aitken-Schwarz method for efficient metacomputing of elliptic equations. In: Proceedings of the Fourteenth Domain Decomposition meeting in Cocoyoc, Mexico 29. B¨ onisch T.B, R¨ uhle R (2001) Efficient Flow Simulation with Structured Multiblock Meshes on Current Supercomputers. In: ERCOFTAC Bulletin No. 50: Parallel Computing in CFD 30. Pickles SM, Brooke JM, Costen FC, Gabriel E, M¨ uller M, Resch M, Ord SM (2001) Future Generation Comp Syst 17:911–918 31. Fagg GE, London KS, Dongarra JJ (1998) MPI Connect Managing Heterogeneous MPI Applications Interoperation and Process Control, In: Alexandrov V, Dongarra J (eds) Recent advances in Parallel Virtual Machine and Message Passing Interface, LNCS 1497, Springer

High Performance Computing in Engineering and Science

45

32. Gabriel E, Resch M, Beisel T, Keller R (1998) Distributed Computing in a Heterogeneous Computing Environment, In: Alexandrov V, Dongarra J (eds) Recent advances in Parallel Virtual Machine and Message Passing Interface, LNCS 1497, Springer 33. Imamura T, Tsujita Y, Koide H, Takemiya H (2000) An Architecture of Stampi: MPI Library on a Cluster of Parallel Computers, In: Dongarra J, Kacsuk P, Podhorszki N (eds) Recent Advances in Parallel Virutal Machine and Message Passing Interface, LNCS 1908, Springer 200–207 34. Karonis N, Toonen B. MPICH-G2, http://www.niu.edu/mpi 35. Mindterm Secure Shell http://www.mindbright.se 36. Almond J, Snelling D (1998) UNICORE: Secure and Uniform Access to Distributed Resources, http://www.unicore.org, A White Paper, October 37. Brunst H, Winkler M, Nagel WE, Hoppe H-C (2001) Performance optimization for large scale computing: The scalable vampir approach, In: Alexandrov VN, Dongarra JJ, Juliano BA, Renner RS, Tan CK (eds) Computational Science – ICCS 2001, Part II, LNCS 2074, Springer 38. Brunst H, Gabriel E, Lange M, M¨ uller MS, Nagel WE, Resch MM (2003) Performance Analysis of a Parallel Application in the GRID. In: ICCS Workshop on Grid Computing for Computational Science, St. Petersburg, Russia 39. Casanova H, Dongarra J (1997) Int J Supercomp Appl High Perf Comp 11(3):212–223 40. Girona S, Labarta J, Badia RM (2000) Validation of Dimemas communication model for MPI collective communications, In: Dongarra J, Kacsuk P, Podhorszki N (eds) Recent Advances in Parallel Virutal Machine and Message Passing Interface, LNCS 1908, Springer 41. Hackenberg MG, Redler R, Post P, Steckel B (2000) MpCCI, multidisciplinary applications and multigrid, Proceedings ECCOMAS 2000, CIMNE, Barcelona 42. Lindner P, Currle-Linde N, Resch MM, Gabriel E (2002) Distributed Application Management in Heterogeneous Grids. In: Proceedings of the Euroweb Conference, Oxford, UK 43. M¨ uller M, Gabriel E, Resch M (2002) A Software Development Environment for Grid-Computing, Concurrency Comput Pract Exp 14:1543–1551 44. Gabriel E, Keller R, Lindner P, M¨ uller MS, Resch MM (2003) Software Development in the Grid: The DAMIEN tool-set. In: International Conference on Computational Science, St. Petersburg, Russia 45. EUROGRID http://www.eurogrid.org 46. DAMIEN – Distributed Application and Middleware for Industrial Use of European Networks, http://www.hlrs.de/organization/pds/projects/damien 47. Resch MM, M¨ uller M, K¨ uster U, Lang U (2003) A Workbench for Teraflop Supercomputing. In: Supercomputing in Nuclear Applications 2003, Paris, France 48. The Lustre Project http://www.lustre.org 49. Lang U, Peltier JP, Christ P, Rill S, Rantzau D, Nebel H, Wierse A, Lang R, Causse S, Juaneda F, Grave M, Haas P (1995) Fut Gen Comp Sys 11:419–430 50. Garbey M, Resch MM, Vassilevski Y, Sander B, Pless D, Fleiter TR (2002) Stent Graft Treatment Optimization in a Computer Guided Simulation Environment. In: The Second Joint Meeting of the IEEE Engineering in Medicine and Biology Society and Biomedical Engineering Society, Houston, TX, USA 51. Resch MM, Garbey M, Sander B, K¨ uster U (2002) Blood flow simulation in a GRID environment. In: Parallel CFD Conference, Kansai, Japan

46

M. Resch

52. Sander B, K¨ uster U, Resch MM (2002) Towards a Transient Blood Flow Simulation with Fluid Structure Interaction, In: Valafar F et al. (eds) Proceedings of the 2002 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences METMBS’02, CSREA Press 53. Perktold K, Peter RO, Resch M, Langs G (1991) J Biomed Eng 13(6):507–515 54. Sander B, Pless D, Fleiter TR, Resch MM (2001) Computational Fluid Dynamics (CFD): coupled solving of CFD and structural mechanics in aneurysms and stentgrafts having regard to the elastic behaviour of the aortic wall and varying positions of the stentgraft. In: 9th Annual Medicine Meets Virtual Reality Medical Conference, Newport Beach/California 55. Adamidis P, Resch MM (2003) Parallel Coupled Thermomechanical Simulation using Hybrid Domain Decomposition. In: The 2003 international conference on computational science and its application (ICCSA), 2003, Montreal, Canada 56. Rieger H, Fornasier L, Haberhauer S, Resch MM (1996) Pilot Implementation of an Aerospace Design System into a Parallel User Simulation Environment, In: Liddell H, Colbrok A, Hertzberger P, Sloot P (eds), LNCS 1067, Springer 57. Kimura T, Takemiya H (1998) Local Area Metacomputing for Multidisciplinary Problems: A Case Study for Fluid/Structure Coupled Simulation. In: 12th ACM International Conference on Supercomputing, Melbourne/Australia

Completely splitting method for the Navier-Stokes problem I.V. Kireev1 , U. R¨ ude2 , and V.V. Shaidurov3 1 2 3

Institute of Computational Modelling SB RAS, Academgorodok, 630090 Krasnoyarsk, Russia [email protected] University of Erlangen–Nuremberg, Cauerstraße 6, 91058 Erlangen, Germany [email protected] Institute of Computational Modelling SB RAS, Academgorodok, 630090 Krasnoyarsk, Russia [email protected]

Summary. We consider two-dimensional time-dependent Navier-Stokes equations in a rectangular domain and study the method of full splitting [3]-[4]. On the physical level, this problem is splitted into two processes: convection-diffusion and action of pressure. The convection-diffusion step is further splitted in two geometric directions. To implement the finite element method, we use the approach with uniform square grids which are staggered relative to one another. This allows the Ladyzhenskaya-Babu˘ska-Brezzi condition for stability of pressure to be fulfilled without usual diminishing the number of degrees of freedom for pressure relative to that for velocities. For pressure we take piecewise constant finite elements. As for velocities, we use piecewise bilinear elements.

1 The formulation of the problem and the splitting into physical processes In the rectangular domain Ω = (0, 1)×(0, 1) with the boundary Γ we consider the two-dimensional Navier-Stokes equation 1 ∂u Δu + (u · ∇)u + ∇p = f in Ω × (0, T ), (1) − Re ∂t the continuity equation ∇·u=0

in Ω × (0, T ),

(2)

Γ × [0, T ],

(3)

the boundary condition u=g

on

and the initial condition u(x, y, 0) = u0 (x, y)

on Ω.

(4)

48

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

Here u(x, y, t) = (u1 (x, y, t), u2 (x, y, t)) is an unknown speed vectorfunction; p(x, y, t) is an unknown pressure function; f (x, y, t) = (f1 (x, y, t), f2 (x, y, t)) is a given vector-function; g(x, y, t) = (g1 (x, y, t), g2 (x, y, t)) is a given continuous vector-function on Γ ×[0, 1]; u0 (x, y) = (u0,1 (x, y), u0,2 (x, y)) is a given continuous vector-function on Ω; Re is the Reynolds number. If these equations have a solution u, p then one can see that a pair u, p + c is also a solution for any constant c. In order to exclude the multivalence we demand that  p dΩ = 0. (5) Ω

Rewrite the vector equation (1) in the form of two scalar ones. Put ν = 1/Re and replace the third term of (1) by equivalent sum of two expressions on account of the continuity equation: ∂p 1 1 ∂u1 = f1 , − νΔu1 + (u · ∇)u1 + div(u1 u) + ∂x 2 2 ∂t

(6)

∂p 1 1 ∂u2 = f2 . − νΔu2 + (u · ∇)u2 + div(u2 u) + ∂y 2 2 ∂t

(7)

For the obtained problem (2)–(7), at first we consider Chorin’s splitting method [3] - [4] (of fractional steps) into two physical processes: transfer with diffusion of substance and pressure action. Therefore, the time interval [0, T ] is divided into m equal segments, τ = T /m long, by the nodes of the time grid ¯ τ \ {0}. ω ¯ τ = {tk : tk = kτ, k = 0, 1, . . . , m} and ω τ = ω Instead of the exact functions p and u we will seek a function pτk (x, y) and a vector-function uτk (x, y) = (uτ1,k (x, y), uτ2,k (x, y)) which are determined at a discrete instant of time t = kτ . At first we use the condition (4) and put uτ0 (x, y) = u0 (x, y)

in Ω.

(8)

Then we construct the sequence of problems alternating on every segment [tk , tk+1 ]. Two first problems for s = 1 and s = 2 are not connected with each other and are required to determine the vector-function v(x, y, t) = (v1 (x, y, t), v2 (x, y, t)): 1 1 1 ∂vs − νΔvs + (uτk · ∇)vs + div(vs uτk ) = fs 2 2 2 ∂t vs = gs

on Γ × [tk , tk+1 ],

vs (x, y, tk ) = uτs,k (x, y) in Ω.

in Ω × (tk , tk+1 ),

(9) (10) (11)

After this the obtained function at time level tk+1 is used as an initial value for the other problem for the determination of the vector-function

Completely splitting method for the Navier-Stokes problem

49

w(x, y, tk ) = (w1 (x, y, tk ), w2 (x, y, tk )) and the function q(x, y, tk ) on the same segment [tk , tk+1 ]: 1 ∂w + ∇q = f 2 ∂t div w = 0

in Ω × (tk , tk+1 ),

in Ω × (tk , tk+1 ),

(12) (13)

w · n = g · n on Γ × (tk , tk+1 ),

(14)

w(x, y, tk ) = v(x, y, tk+1 ) in Ω,

(15)

where n(x, y) = (n1 (x, y), n2 (x, y)) is the vector of outer normal to the boundary Γ at a point (x, y) ∈ Γ , which is redefined at a vertex of a square. The solution of the splitting problem at the time point tk+1 is a result of a loop on the segment [tk , tk+1 ]: uτk+1 (x, y) = w(x, y, tk+1 ),

(16)

pτk+1 (x, y) = q(x, y, tk+1 ) in Ω.

(17)

Repeating this computation loop for k = 0, . . . , m − 1, we sequentially obtain the values of functions uτ and pτ at time levels τ, . . . , T . Remark 1. It is necessary to pay attention to the change of the boundary condition (14) in comparison with (3). The substitution is necessary because the condition w = g on Γ × [tk , tk+1 ] gives an overdetermined problem. 2

2 Discretization of the fractional step of pressure action Consider the problem (12)–(15) and sequentially carry out the time discretization and then the space one. The time discretization is realized by the replacement the derivative ∂/∂t with the difference ration: ∂w1 (x, y, t) ≈ (w1 (x, y, t) − w1 (x, y, t − τ ))/τ. (18) ∂t After rearranging the known terms to the right-hand side we obtain the stationary differential problem at time level tk+1 : 1 1 1 k+1 ∂q k+1 = f1k+1 + w1k w + τ 2 ∂x τ 1

in Ω,

(19)

1 1 1 k+1 ∂q k+1 = f2k+1 + w2k + w τ 2 ∂y τ 2

in Ω,

(20)

∂w2k+1 ∂w1k+1 =0 + ∂y ∂x

in Ω

(21)

50

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

with the boundary condition wk+1 · n = gk+1 · n on Γ.

(22)

From here on for an arbitrary function the notation uk means u(tk ). For the space discretization, we apply the finite element method. Therefore turn to the generalized formulation. Consider three arbitrary functions v1 (x, y), v2 (x, y), r(x, y); two ones satisfy the boundary condition v1 n1 + v2 n2 = 0

on Γ.

(23)

Multiply the equations (19)–(21) by v1 , v2 , q respectively, combine them, integrate by parts over Ω, and apply the condition (23). As a result, we obtain   1 k+1 1 k+1 k+1 ∂v1 , (w , v1 )Ω + (w2 , v2 )Ω − q ∂x Ω τ τ 1   k+1   k+1   ∂w2 ∂w1 k+1 ∂v2 ,r (24) ,r + + − q , ∂y ∂x ∂y Ω Ω

Ω

=

1 1 1 1 k+1 (f , v1 )Ω + (f2k+1 , v2 )Ω + (w1k , v1 )Ω + (w2k , v2 )Ω τ τ 2 2 1

where (·, ·)Ω means the scalar product  uv dΩ. (u, v)Ω = Ω

In this paper from time to time we shall use a method of fictitious domains in the small (near the boundary). First, let us introduce the domain Ω1 = (0, 1) × (−h/2, 1 + h/2) and divide it into n(n + 1) squares ei+1/2,j = (xi , xi+1 ) × (yj−1/2 , yj+1/2 ) by lines xi = ih,

i = 0, . . . , n; yj+1/2 = (j + 1/2)h,

j = −1, . . . , n.

For v1 , w1k+1 we introduce the space Hx of admissible functions which are ¯1 and bilinear on each ei+1/2,j ⊂ Ω1 . The degrees of freedom continuous on Ω of these functions are referred to the nodes zi,j+1/2 = (xi , yj+1/2 ). We denote the set of these nodes ¯1h ∩ Ω. ¯1h = {zi,j+1/2 : i = 0, . . . , n, j = −1, . . . , n} and Ω1h = Ω Ω Then as the basis function corresponding to the node zi,j+1/2 we take ¯ h. ϕx,i,j+1/2 ∈ Hx which equals 1 at zi,j+1/2 and 0 at any other node of Ω 1

Completely splitting method for the Navier-Stokes problem n+

1 t 2

t

t

t

t

t

t

n−

1 t 2

t

t

t

t

t

t

...

t

t

t

t

t

 9 t t

zi,j+1/2 t t   

j+

1 2

t

t

t

j−

1 2

t

t

t

ei+1/2,j t t ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ·· ··  ··  ·· ·· ·· ··  9 · ·· ·· ···  t·· ·· ·· ·· ·· ·· t t t

t

t

t

t

t

t



1 2

t

t

t

t

t

t

t

1 2

t

t

t

t

t

t

t

...



51

i−1 i i + 1 ... n h ¯ Fig. 1. Nodes Ω1 of degrees of freedom for the first component of velocity (marked by sign •) 0

...

r r r

r

br i−1

r j + 3/2

r

r i

r j + 1/2 r j − 1/2 i+1

Fig. 2. Basis function ϕx,i,j+1/2 for the first component of velocity

¯ h and some basis functions from Hx are repThe arrangement of nodes Ω 1 resented in Fig. 1, 2. Second, let us introduce the domain Ω2 = (−h/2, 1 + h/2) × (0, 1) and divide it into n(n + 1) squares ei,j+1/2 = (xi−1/2 , xi+1/2 ) × (yj , yj+1 )

52

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

n * ...

*

j +1 *

*

Γ *

*

*

*

*

*

*

*

*

*

*

*

*

*·· ·· ·· ·· ·· ·· ·· ·· ·· · · · ·· ·· ·· ·· ·· ·· ··· ··· X *y * XX* X

*

*

*

*

*

j −1 *

*

*

*

*

*

*

*

*

*

*

*

*

*

0 * 1 − 2

* 1 2

* ...

* 1 i− 2

* 1 i+ 2

* ...

j

...

e

i,j+1/2    * ·· ·· ··* * * ··  ·· ··  ·9 ··

*

*

XXX

XXXzi+1/2,j * *

* * 1 1 n+ n− 2 2

¯2h of degrees of freedom for the Fig. 3. Nodes Ω second component of velocity (marked by sign

∗)

* * * * i − 12

*

*

j

* j−1

* i+

* j+1

*

1 2

i+

3 2

Fig. 4. Basis function ϕy,i+1/2,j for the second component of velocity

by lines xi+1/2 = (i + 1/2)h,

i = −1, . . . , n; yj = jh,

j = 0, . . . , n.

For v2 , w2k+1 we introduce the space Hy of admissible functions which are ¯2 and bilinear on each ei,j+1/2 ⊂ Ω2 . The degrees of freedom continuous on Ω of these functions are referred to the nodes zi+1/2,j = (xi+1/2 , yj ). We denote the set of these nodes ¯ h ∩ Ω. ¯ h = {zi+1/2,j : i = −1, . . . , n, j = 0, . . . , n} and Ω h = Ω Ω 2

2

2

Then as the basis function corresponding to the node zi+1/2,j we take ¯2 . ϕy,i+1/2,j ∈ Hy which equals 1 at zi+1/2,j and 0 at any other node of Ω

Completely splitting method for the Navier-Stokes problem

53

n+1 













































































































n ... j+1 j

...

zi+1/2,

j+1/2

XXX ei+1/2, X

j+1/2

    

·· ·· ·· ·· ·· ··  ·· ·· ·· ·· ·· ·· 9  ··· ··· ······ ··· ··· ·· X ·· ·· ·· ·· ·· X y X









XXX X 

0 -1 -1

0

... i i + 1 ... n n+1 h ¯ Fig. 5. Nodes Ω3 of degrees of freedom for pressure (marked by sign )

j+1 j i

i+1

Fig. 6. Basis function ϕp,i+1/2,j+1/2 for pressure

¯2 and some basis functions from Hx are The arrangement of nodes of Ω represented in Fig. 3, 4. Finally, let us introduce the domain Ω3 = (−h, 1 + h) × (−h, 1 + h) and divide it into (n + 2)2 squares ei+1/2, j+1/2 = (xi , xi+1 ) × (yj , yj+1 ) by lines xi = ih, i = −1, . . . , n + 1; yj = jh, j = −1, . . . , n + 1.

54

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

For r, q k+1 we introduce the space Hp of admissible functions from L2 (Ω) which are constant on each ei+1/2,j+1/2 ⊂ Ω3 . The degrees of freedom of these functions are referred to the nodes zi+1/2,j+1/2 = (xi+1/2 , yj+1/2 ). We denote the set of these nodes ¯3h ∩ Ω. ¯3h = {zi+1/2,j : i = −1, . . . , n, j = −1, . . . , n} and Ω3h = Ω Ω Then as the basis function corresponding to the node zi+1/2,j+1/2 we take ϕp,i+1/2,j+1/2 ∈ Hp which equals 1 at zi+1/2,j+1/2 and 0 at any other node of ¯ h. Ω 3 ¯ h and some basis functions from Hp are The arrangement of nodes of Ω 3 represented in Fig. 5, 6. Introduce the grid boundary Γ h as the set of midpoints of boundary edges ¯h ∪ Ω ¯ h ) ∩ Γ, Γ h = (Ω 1

2

and introduce also the scalar product for vector-functions  (u, f )Ω = (u1 f1 + u2 f2 )dΩ. Ω

Theoretically we realize two possibilities. One of them consists in the strong integration over Ω and gives several types of discrete equations inside a domain and near a boundary. In another case the integration is implemented over a domain with a small fictitious additional subdomains that provides discrete equations to be more uniform and simpler for coding. To realize the first possibility, we formulate the Bubnov-Galerkin method for the problem (25) using the introduced designation: find q h (x, y) ∈ Hp and wh (x, y) = (w1h (x, y), w2h (x, y)), w1h ∈ Hx , w2h ∈ Hy , which satisfy the boundary condition (25) wh · n = gk+1 · n on Γ h and the integral relation 1 1 1 h (w , v)Ω − (q h , ∂iv v)Ω + (∂iv wh , r)Ω = (f k+1 , v)Ω + (wk , v)Ω (26) τ 2 τ for an arbitrary function r(x, y) ∈ Hp and for a vector-function v(x, y) = (v1 (x, y), v2 (x, y)), v1 ∈ Hx , v2 ∈ Hy , which satisfies the boundary condition v·n=0

on

Γ h.

(27)

Let us write the unknown functions in the form n n−1   h w1h (x, y) = w1,i,j+1/2 ϕx,i,j+1/2 (x, y), i=0 j=0

w2h (x, y) =

n n−1 

h w2,i+1/2,j ϕy,i+1/2,j (x, y),

i=0 j=0

q h (x, y) =

n−1  n−1  i=0 j=0

h qi+1/2,j+1/2 ϕp,i+1/2,j+1/2 (x, y).

(28)

Completely splitting method for the Navier-Stokes problem

55

Then the problem (25) – (27) becomes equivalent to the system of linear algebraic equations. To get the diagonal mass matrix we shall systematically use the following quadrature formula which is the Cartesian product of the trapezium formula: x+h/2 y+h/2  

u(x, y)dΩ ≈ x−h/2 y−h/2

h2  u(x ± h/2, y ± h/2). 4 ±,±

(29)

 Here the sign with the pointer ±, ± means the summation of an expression with 4 possible arguments obtained by fixing of one sign + or – at each position ±. First of all we consider the boundary condition (25). We introduce the discrete analogue of Γx , Γy : ¯1h ∩ Γ ∪ {z0,0 , z0,n , zn,0 , zn,n }, Γyh = Ω ¯2h ∩ Γ ∪ {z0,0 , z0,n , zn,0 , zn,n }. Γxh = Ω Doing the simplifications which are connected with the concrete form of normal vector, we get w1h = g1k+1 w2h = g2k+1

on Γxh , on Γyh .

(30) (31)

The question of consequence of the boundary condition (27) arises. For example, consider the nearboundary cell en−1/2,j . From the concrete form of the external normal (1, 0) and the condition (27) at the node (xn , yj+1/2 ) it follows that v1,n,j+1/2 = 0.

(32)

Hence, for any coefficients the terms containing v1,n,j+1/2 in the both sides of the equality (26) do not give an equation corresponding to this value (or what is the same to the node zn,j+1/2 ). Analogously, for the nearboundary cell ei,n−1/2 we have v2,i+1/2,n = 0.

(33)

Here this value turns to zero and there is no grid equation corresponding to it for wh , q h . At last, both situations (32), (33) take place at the same time for the node en−1/2,n−1/2 and no equation exists for two nodes zn,n−1/2 and zn−1/2,n . One of three situations takes place along all grid boundary Γ h . To do the grid equations more habitual we introduce the following notations:

56

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

ux◦ (x) = (u(x + h/2) − u(x − h/2))/h, uy◦ (y) = (u(y + h/2) − u(y − h/2))/h, 1 3 h w + wh ; 4 1,i,1/2 4 1,i,0 1 h 3 h h ; + w1,i,n = w1,i,n−1/2 w ¯1,i,n−1/2 4 4 h h w ¯1,i,j+1/2 = w1,i,j+1/2 , j = 0, . . . , n − 1; i = 0, . . . , n;

h = w ¯1,i,1/2

h w ¯2,1/2,j = h w ¯2,i+1/2,j

(34)

1 h 3 h 1 3 h h ;(35) + w2,n,j ¯2,n−1/2,j = w2,n−1/2,j w + wh ; w 4 4 4 2,1/2,j 4 2,0,j h = w2,i+1/2,j , i = 0, . . . , n − 1; j = 0, . . . , n;

¯1k , w ¯2k , g¯1k+1 , g¯2k+1 . We get equations: and similar formulae for f¯1k+1 , f¯2k+1 , w h2 k+1 h2 k h2 h h h , (36) w ¯1,i,1/2 + f¯1,i,1/2 w ¯1,i,1/2 + h(qi+1/2,1/2 − qi−1/2,1/2 )= 2 τ τ i = 1, . . . , n − 1; and 1 k 1 ¯k+1 1 h ¯ + f w ¯1 + q h◦ = w x τ 1 2 1 τ

on

Ω1h ,

(37)

1 k 1 ¯k+1 1 h ¯ + f w ¯ + q h◦ = w y τ 2 2 2 τ 2

on

Ω2h ,

(38)

(w ¯1h )x◦ + (w ¯2h )y◦ = 0

Ω3h .

(39)

w ¯1h = g¯1k+1

on Γxh ,

(40)

g¯2k+1

Γyh .

(41)

on

The boundary conditions are

w ¯2h

=

on

It should be noted that we obtained the difference scheme with staggerred nodes which was very popular at the end of 1970-s and at the beginning of 1980-s. It is easy prove that the problems ( 34)–(41) are stable with respect to the initial data and the right-hand side f k+1 for the components of a speed vector. For this purpose we introduce grid norms which are analogous to functional L2 -norms: w L2 ,h = ( w1 21,h + w2 22,h )1/2 where

Completely splitting method for the Navier-Stokes problem

w1 21,h = h2

n−1  n−1 

57

w12 (zi,j+1/2 ),

(42)

w22 (zi+1/2,j ),

(43)

p2 (zi+1/2,j+1/2 ).

(44)

i=1 j=0

w2 22,h = h2

n−1  n−1  i=0 j=1

and q 23,h = h2

n−1  n−1  i=0 j=0

Theorem 1. If g1k+1 = 0

on

Γxh , g2k+1 = 0

on

Γyh

(45)

for the problem (32)–(41) then the following a priori estimate holds: ¯ k L2 ,h + ¯ h L2 ,h ≤ w w

τ ¯k+1 f L2 ,h . 2

2

(46)

Now we construct the problem for determination of pressure and consider the question of its stability. To do this, take the difference derivative (·)x◦ of (37) at nodes of Ω3h : ¯1k )x◦ + (w ¯1h )x◦ + τ (q h◦ )x◦ = (w x

τ ¯k+1 (f )x◦ . 2 1

(47)

To define the derivative (w ¯2h )y◦ we take the difference derivative (·)y◦ of (38): ¯2k )y◦ + (w ¯2h )y◦ + τ (q h◦ )y◦ = (w y

τ ¯k+1 (f )y◦ . 2 2

(48)

Now we eliminate (w ¯1h )x◦ and (w ¯2h )y◦ in (39), divide the obtained equality by τ , and rearrange the known expressions to the right-hand side. As a result we get 1 1 1 k 1 k ¯2 )y◦ − (f¯1k+1 )x◦ − (f¯2k+1 )y◦ ¯1 )x◦ − (w −(q h◦ )x◦ −(q h◦ )y◦ = − (w x y 2 2 τ τ

on Ω3h . (49)

And at the nodes of Γxh and Γyh the other conditions of Neumann type follow from (37), (40) and (38), (41). For example, on Γxh from (37) and (40) it follows that 1 k 1 ¯k+1 1 on Γxh . (50) ¯ + f q h◦ = − g¯1k+1 + w x τ 1 2 1 τ On Γyh from (38) and (41) it follows that 1 k 1 ¯k+1 1 ¯ + f q h◦ = − g¯2k+1 + w y τ 2 2 2 τ

on

Γyh .

(51)

58

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

The system of linear algebraic equations (49) – (51) can be reduced to the (Schur complement) system BQ = G (52) with the symmetric matrix B. This matrix is the same one as for the discrete Poisson equation with the Neumann boundary condition. It is well-known that this matrix is singular, the dimension of its kernel equals 1, and the basis of the kernel consists of only constant n2 -vector S = (1, . . . , 1). Thus, the system (52) has a solution if and only if the right-hand side G is orthogonal to S : n 

Gij = 0.

(53)

i,j=1

Let this be valid. Then the system (52) has the infinite number of solutions. We take (normal) one which is orthogonal to S: n 

Qij = 0.

(54)

i,j=1

Note that this equality is the discrete analogue of the condition (5). In Theorem 1 we considered the impact of initial values and the right-hand side f when computing u. Now let us study the situation when a non-zero right-hand side arises in (34) owing to an approximation (truncation) error or to a residual of iterative process. For this purpose consider the problem 1 h z + rh◦ = 0 on x τ 1 1 h z + rh◦ = 0 on y τ 2

Ω1h ,

(55)

Ω2h ,

(56)

(z1h )x◦ + (z2h )y◦ = ψ h

on

z1h = 0 z2h

=0

Ω3h ,

(57)

on Γxh ,

(58)

Γyh .

(59)

on

Here a grid function ψ h is defined on Ω3h ; zh = (z1h , z2h ). Theorem 2. For the problem (55) – (59) the following a priori estimate holds: zh L2 ,h ≤ c1 ψ h 3,h where a constant c1 depends on Ω only.

(60)

2

Solving the systems (49) – (51) we obtain the grid function of pressure q h at nodes Ω3h at time level tk+1 . After that, by formulae (37) and (38) we calculate the grid functions w1h and w2h obviously. This calculation conclude the description of fractional step of pressure action.

Completely splitting method for the Navier-Stokes problem

59

Remark 2. It should be noted that the special placing of nodes ensures the stability of computation of pressure (see, for example, [31], [2]). 2 To realize the approach with fictitious domains first we consider extended domain Ω1 = (0, 1)×(−h/2, 1+h/2) and prolong the equation (19) by smooth way into additional strips. For this purpose we prolong w1k , w1k+1 , ∂q k+1 /∂x through boundary Γy using Taylor expansions of these functions in direction y. After that we compute   1 k+1 1 k ∂q k+1 k+1 w − w1 + f1 = 2 ∂x τ τ 1 in two strips Ω1 \ Ω. Thus, we have equation (19) to be valid in extended domain Ω1 . Similarly by Taylor expansions we prolong boundary function g1k+1 on 4 segments {0, 1}×(−h/2, 0) and {0, 1}×(1, 1+h/2). It gives boundary condition w1 = g1k+1 on extended segments {0, 1} × (−h/2, 1 + h/2).

(61)

To simplify representation we put v2 = 0 and r = 0 in integral relation like (26) and obtain the following Galerkin formulation: find q h (x, y) ∈ Hp and w1h (x, y) ∈ Hx which satisfy the boundary condition (61) and the integral relation 1 1 1 h (w1 , v1 )Ω1 − (q h , ∂v1 /∂x)Ω1 = (f1k+1 , v1 )Ω1 + (w1h , v1 )Ω1 τ 2 τ

(62)

for an arbitrary function v1 (x, y) ∈ Hx which satisfies the boundary condition v1 = 0 on extended segments {0, 1} × (−h/2, 1 + h/2).

(63)

On the extended domain Ω1 we get the same equations for j = 0, . . . , n − 1 and some equations for j = −1/2 and j = n + 1/2. Last equations does not influence on approximate solution in internal nodes and we shall omit them in our algorithmic constructions. Thus, this way with small fictitious domains gives the uniform equations in all internal nodes of Ω1h h2 k+1 h2 k h2 h h h (64) , w1,i,j+1/2 + f1,i,j+1/2 w1,i,j+1/2 + h(qi+1/2,j+1/2 − qi−1/2,j+1/2 )= 2 τ τ i = 1, . . . , n − 1, j = 0, . . . , n − 1. with boundary conditions k+1 h w1,i,j+1/2 = g1,i,j+1/2 , i = 0, n, j = 0, . . . , n − 1.

(65)

Note that in this equations we does not use any data from fictitious domains therefore we need them only from theoretical point of view without algorithmic complication.

60

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

Second, we consider extended domain Ω2 = (−h/2, 1 + h/2) × (0, 1), prolong equation (20) by smooth way into additional strips Ω2 \ Ω, and prolong boundary function g2k+1 on extended segments (−h/2, 1 + h/2) × {0, 1}. Thus, we get equation (20) to be valid on extended domain Ω2 and the boundary condition w2h = g2k+1 on extended segments (−h/2, 1 + h/2) × {0, 1}

(66)

Again we put v1 = 0 and r = 0 in integral relation (26), take Ω2 instead of Ω, and obtain the uniform equations in all internal nodes of Ω2h : h2 k+1 h2 k h2 h h h (67) , w2,i+1/2,j + f2,i+1/2,j w2,i+1/2,j + h(qi+1/2,j+1/2 − qi+1/2,j−1/2 )= τ τ τ i = 0, . . . , n − 1, j = 1, . . . , n − 1, with boundary conditions k+1 h w2,i+1/2,j = g2,i+1/2,j ,

i = 0, . . . , n − 1, j = 0, n.

One can see that we directly obtain the system of algebraic equations like (37)–(41) then all representations and conclusions (42)–(60) are valid within change w ¯i by wi . Thus, this problem is stable due to Theorem 1, 2.

3 Discretization of the fractional step of convection-diffusion Now we consider the problems (9) – (11) in turn for s = 1 and s = 2. First problem has the form: 1 1 1 ∂v1 − νΔv1 + (uτk · ∇)v1 + div(v1 uτk ) = f1 in Ω × (tk , tk+1 ), (68) 2 2 2 ∂t v1 = g1 on Γ × [tk , tk+1 ], (69) τ (70) v1 (x, y, tk ) = u1,k (x, y) in Ω. Once more realize the splitting of this step in the y and x directions. At first we use the initial condition (70) in the following form w(x, y, tk ) = a(x, y)

¯ on Ω.

(71)

To simplify the notations, in this section we put a(x, y) = uτ1,k (x, y)

and

b(x, y) = uτ2,k (x, y).

(72)

Then two problems are solved on the segment (tk , tk+1 ). The first problem contains the space derivatives only with respect to y: 1 ∂ 2 w 1 ∂w 1 ∂(bw) ∂w = f1 + −ν 2 + b 4 2 ∂y 2 ∂y ∂y ∂t w = g1

on Γy × [tk , tk+1 ],

in Ω × (tk , tk+1 ),

(73) (74)

Completely splitting method for the Navier-Stokes problem

61

where Γx = {(x, y) ∈ Γ : (x = 0) ∨ (x = 1), y ∈ [0, 1)}, Γy = {(x, y) ∈ Γ : x ∈ [0, 1), (y = 0) ∨ (y = 1)}. Remark 3. It is necessary to pay attention to the modified boundary condition (74) in comparison with (69). On the assumption that w = g1

on

Γ × [tk , tk+1 ]

instead of (74) we should obtain an overdetermined problem. 2 The second problem contains the space derivatives only with respect to x: 1 ∂ 2 u 1 ∂u 1 ∂(au) ∂u = f1 + −ν 2 + a 4 2 ∂x 2 ∂x ∂x ∂t u = g1 on Γx × [tk , tk+1 ], ¯ u(x, y, tk ) = w(x, y, tk+1 ) on Ω.

in Ω × (tk , tk+1 ),

(75) (76) (77)

The solution of this problem is the result of a loop of two fractional steps on the strip [tk , tk+1 ]: ¯ v1 (x, y, tk+1 ) = u(x, y, tk+1 ) on Ω. Now consider the discretization of the problem (71), (73) – (74). The time discretization is achieved by the difference method by means of substitution (18). After rearranging the known terms to the right-hand side we obtain the parametric family (with a parameter x) of stationary ordinary differential equations at time level tk+1 1 1 1 ∂(bwk+1 ) 1 ∂wk+1 ∂ 2 wk+1 1 k+1 = a + f1k+1 + + b w −ν 4 τ ∂y 2 ∂y 2 ∂y τ

(78)

in Ω with the boundary condition wk+1 = g1k+1

on

Γy .

(79)

For the space discretization we apply the finite elements method. Therefore we turn to the generalized formulation. Take an arbitrary function v(x, y) which satisfies the condition v=0

on Γy .

(80)

Multiply the equation (78) by v and integrate by parts over Ω with application of (80). As a result, we obtain the equality     k+1 ∂wk+1 1 ∂v ∂w 1 k+1 ,v b + , , v)Ω + ν (w ∂y ∂y Ω 2 ∂y τ Ω (81)   1 k+1 1 1 k+1 ∂v = (a, v)Ω + (f1 , v)Ω . , bw − 4 τ ∂y Ω 2

62

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

To approximate this problem, we employ the space Hx introduced in section 2. Besides, denote the following set of boundary nodes by Γ2h : Γ2h = {zi,0 = (xi , 0), zi,n = (xi , 1) : i = 0, 1, . . . , n}.

(82)

Again theoretically we realize two possibilities: the integration over Ω and the integration over domains with small fictitious additional subdomains to simplify discrete equations. The realization of first approach gives the following Galerkin scheme for the problem (79)–(81): find a function wh (x, y) ∈ Hx which satisfies the boundary condition wh = g1k+1

on Γ2h

(83)

and the integral relation       h 1 ∂wh 1 ∂w ∂v 1 h h ∂v = bw , ,v − b + , (w , v)Ω + ν ∂y Ω ∂y ∂y ∂y Ω 2 τ Ω 2 1 1 (84) = (a, v)Ω + (f1k+1 , v)Ω 4 τ for an arbitrary function v ∈ Hx which satisfies the boundary condition v=0

Γ2h .

on

(85)

h

We will seek the unknown function w in the form n n   h wi,j+1/2 ϕx,i,j+1/2 (x, y). wh (x, y) =

(86)

i=0 j=−1

The problem (83)–(85) is equivalent to the system of linear algebraic equations h . To form the coefficients of this system, we with respect to unknowns wi,j+1/2 suppose that (87) a ∈ Hx , b ∈ H y . This will be ascertained during the final assembling of the discrete timedependent problem. To simplify the mass and stiffness matrices, we again use the trapezium quadrature formula (29). To study the grid equations further we make its nodal assembly. In order for an arbitrary value of vi,j+1/2 to satisfy the equality (84), we must equate its coefficients in the left-hand and right-hand sides. At inner nodes four elements ei±1/2,j ei±1/2,j+1 have nonzero coefficients. Summing these coefficients over four elements, equate them in the left-hand and right-hand sides:    2  h h h h + 2ν wi,j+1/2 + + −ν − (bi−1/2,j + bi+1/2,j ) wi,j−1/2 τ 4   h h = (88) + −ν + (bi−1/2,j+1 + bi+1/2,j+1 ) wi,j+3/2 4 h2 k+1 h2 , i = 1, 2, . . . , n − 1; j = 1, 2, . . . , n − 2. ai,j+1/2 + f1,i,j+1/2 = 4 τ

Completely splitting method for the Navier-Stokes problem

63

Taking into consideration that b ∈ Hy we get the shorter form     2   h h h h h h + −ν + b∗ij+1 wi,j+3/2 + 2ν wi,j+1/2 + −ν − b∗ij wi,j−1/2 2 τ 2 h2 k+1 h2 , i = 1, . . . , n − 1; j = 1, . . . , n − 2. (89) ai,j+1/2 + f1,i,j+1/2 = 4 τ From here on the asterisk ∗ warns that this value of function is a linear combination of nodal values, for example, b∗ij = (bi−1/2,j + bi+1/2,j )/2. At the boundary nodes of Γxh the assembly is fulfilled over two elements only:    2  h h ν h h + ν w0,j+1/2 − − b1/2,j w0,j−1/2 + 2τ 4 2 (90)   h2 k+1 h2 h ν h a0,j+1/2 + f1,0,j+1/2 , + − + b1/2,j+1 w0,j+3/2 = 8 2τ 4 2   3  h h ν h h + ν wn,j+1/2 − − bn−1/2,j wn,j−1/2 + 2τ 4 2   h2 k+1 h2 h ν h , an,j+1/2 + f1,n,j+1/2 = + − + bn−1/2,j+1 wn,j+3/2 8 2τ 4 2 

(91)

j = 1, . . . , n − 2. To close the system of linear algebraic equations, first we amplify it by boundary conditions k+1 wh (xi , yj ) = gij ,

i = 0, 1, . . . , n; j = 0, n.

(92)

To do more laconic the form of these equalities we introduce the grid operators of the local averaging in x and in y: uxˆ (x) = (u(x − h/2) + u(x + h/2))/2,

uyˆ(y) = (u(y − h/2) + u(y + h/2))/2. (93) With the help of the designation (82) we can write wh = wyhˆ = g1k+1

on

Γ2h .

(94)

Analogously we get v = vyˆ = 0

on Γ2h .

(95)

Assembling the algebraic equations corresponding to vi,1/2 , i = 1, . . . , n−1, over 4 elements ei±1/2,0 ∩ Ω and ei±1/2,1 , we get:

64

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

   h 3h2 h h + 3ν wi,1/2 + −ν + b∗i,1 wi,3/2 2 4τ   h ∗ 3h2 k+1 3h2 k+1 b + 2ν g1,i,0 , f + ai,1/2 + = 2 i,1/4 16 1,i,1/2 4τ



(96)

for i = 1, . . . , n − 1. At the boundary node z0,1/2 the algebraic equation is assembled over 2 elements e1/2,0 ∩ Ω and e1/2,1 :    h ν 3 3h2 h h + ν w0,1/2 + − + b1/2,1 w0,3/2 4 2 2 8τ   h ∗ 3h2 k+1 3h2 k+1 b + ν g1,0,0 . f + a0,1/2 + = 4 1/2,1/4 32 1,0,1/2 8τ



(97)

The similar equation is valid at the node zn,1/2 :    h ν 3 3h2 h h + ν wn,1/2 + − + bn−1/2,1 wn,3/2 4 2 2 8τ   h ∗ 3h2 k+1 3h2 k+1 bn−1/2,1/4 + ν g1,n,0 . f1,n,1/2 + an,1/2 + = 4 32 8τ



(98)

Without repetition of assembly we write the algebraic equations on the upper part of the boundary Γ2h :     2 h ν 3 3h h h + − − b1/2,n−1 w0,n−3/2 + ν w0,n−1/2 4 2 2 8τ (99)   h ∗ 3h2 k+1 3h2 k+1 f + − b1/2,n−1/4 + ν g1,0,n ; a0,n−1/2 + = 4 32 1,0,n−1/2 8τ     2 h 3h h h + 3ν wi,n−1/2 + −ν − b∗i,n−1 wi,n−3/2 2 4τ   h ∗ 3h2 k+1 3h2 k+1 , (100) f + − bi,n−1/4 + 2ν g1,i,n ai,n−1/2 + = 2 16 1,i,n−1/2 4τ i = 1, . . . , n − 1;    h ν 3 3h h h + − − bn−1/2,n−1 wn,n−3/2 + ν wn,n−1/2 4 2 2 8τ (101)   h ∗ 3h2 k+1 3h2 k+1 f + − bn−1/2,n−1/4 + ν g1,n,n . an,n−1/2 + = 4 32 1,n,n−1/2 8τ 

2

We can prove that the obtained system is stable with respect to initial data and a right-hand side. Since the mass matrix is not constant over nodes of Ω1h , introduce the weight coefficients

Completely splitting method for the Navier-Stokes problem

σi = and

1 if i = 1, . . . , n − 1, 1/2 if i = 0, n,

⎧ j = 0, n, ⎨ 1/4 if ρj = 3/4 if j = 1/2, n − 1/2, ⎩ 1 if j = 3/2, . . . , n − 3/2.

With these weights introduce then normn   h σi ρj+1/2 (wi,j+1/2 )2 . wh 21,σ = h2 i=0

65

(102)

(103)

(104)

j=0

Theorem 3. Let the condition g1k+1 = 0

on

Γyh

(105)

be valid. Then for the system (89)–(91), (96)–(101) the a priory estimate wh 1,σ ≤ a 1,σ + holds for any grid function b ∈ Hy

τ k+1 f 1,σ 4 1

(106)

2.

Now consider the discretization of the problem (75) – (77). The time discretization is achieved by means of the substitution (18). After rearranging the term known due to (77) to the right-hand side we obtain the parametric family (with the parameter y) of stationary ordinary differential equations at time level tk+1 : 1 1 1 ∂(auk+1 ) 1 ∂uk+1 ∂ 2 uk+1 1 k+1 = wk+1 + f1k+1 in Ω + a + u −ν 4 τ ∂x 2 ∂x 2 ∂x2 τ (107) with the boundary condition uk+1 = g1k+1

on Γx .

(108)

For the space discretization we turn to the generalized formulation. To do it we take an arbitrary function v(x, y) which satisfies the condition v=0

on

Γx .

(109)

Multiply the equation (107) by v and integrate by parts over Ω with the help of (109). As a result, we obtain     k+1 ∂uk+1 1 ∂v ∂u 1 k+1 ,v a + , (u , v)Ω + ν ∂x ∂x ∂x Ω 2 τ Ω (110)   1 1 ∂v 1 = (wk+1 , v)Ω + (f1k+1 , v)Ω . auk+1 , − 4 τ ∂x Ω 2

66

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

To approximate this problem, we again employ the space Hx introduced in section 2. As a result, we obtain the following Galerkin problem: find a function uh (x, y) ∈ Hx which satisfies the boundary condition uh = g1k+1

on

Γxh

(111)

and the integral relation       h ∂v 1 ∂uh 1 ∂u ∂v 1 h = auh , ,v − a + , (u , v)Ω + ν ∂x Ω 2 ∂x ∂x ∂x Ω 2 τ Ω (112) 1 k+1 1 h = (w , v)Ω + (f1 , v)Ω 4 τ for an arbitrary function v ∈ Hx which satisfies the boundary condition v=0

on

Γxh .

(113)

It should be noted that in the right-hand side we replace the function wk+1 by its approximation wh ∈ Hx obtained by solving the problem (83)–(85). We will seek the unknown function uh in the form uh (x, y) =

n n−1  

uhi,j+1/2 ϕx,i,j+1/2 (x, y).

(114)

i=0 j=0

Then the problem (112) – (114) is equivalent to the system of linear algebraic equations with respect to the unknowns uhi,j+1/2 . Using the trapezium quadrature formula we obtain the following stiffness matrix of the element ei+1/2,j :    2  h h + 2ν uhi,j+1/2 + −ν − a∗i−1/2,j+1/2 uhi−1,j+1/2 + τ 2   h2 k+1 h2 h h ∗ h , wi,j+1/2 + f1,i,j+1/2 + −ν + ai+1/2,j+1/2 ui+1,j+1/2 = 4 τ 2 i = 1, . . . , n − 1; j = 1, . . . , n − 2.

(115)

To close the system of linear algebraic equations along these lines we amplify it by the boundary conditions (111). Assembling the algebraic equations corresponding to vi,1/2 , i = 1, . . . , n−1, over 4 elements ei±1/2,0 ∩ Ω and ei±1/2,1 we get:   2   3ν 3h 3h ∗ 3ν h uhi,1/2 + + a ui−1,1/2 + − − 2 4τ 8 i−1/2,1/2 4   3h2 k+1 3h2 h 3h ∗ 3ν f , wi,1/2 + ai+1/2,1/2 uhi+1,1/2 = + + − 16 1,i,1/2 4τ 8 4 i = 1, . . . , n − 1.

(116)

Completely splitting method for the Navier-Stokes problem

67

And finally we assemble the algebraic equations corresponding to vi,0 , i = 1, . . . , n − 1, over 2 elements ei±1/2,0 ∩ Ω:      2  h ν ν h h ν uhi,0 + − + a∗i+1/2,0 uhi+1,0 + − − a∗i−1/2,0 uhi−1,0 + 8 4 2 4τ 8 4 2 2 h h h i = 1, . . . , n − 1. (117) w + f k+1 , = 4τ i,0 16 1,i,0 Similar equations arise near the upper part of the boundary Γy :    2  3ν 3h 3h ∗ 3ν h uhi,n−1/2 + ui−1,n−1/2 + a − − 2 4τ 8 i−1/2,n−1/2 4   3h ∗ 3ν ai+1/2,n−1/2 uhi+1,n−1/2 = + + − 8 4 =

3h2 k+1 3h2 h f , wi,n−1/2 + 16 1,i,n−1/2 4τ

(118)

and     2   h ∗ ν ν h h ∗ ν h h ui,n + − + ai+1/2,n uhi+1,n + − − ai−1/2,n ui−1,n + 8 4 2 4τ 8 4 =

h2 k+1 h2 h , wi,n + f1,i,n 16 4τ

i = 1, . . . , n − 1.

(119)

By analogy with the proof of Theorem 3 we obtain the stability of the system (111), (115), (116) – (119) with respect to initial data and a righthand side, which we describe without substantiation. For this purpose we introduce the norm ⎛ ⎞ n−1  n−1  ⎝ ρj+1/2 (uhi,j+1/2 )2 + ρ0 (uhi,0 )2 + ρn (uhi,n )2 ⎠ . (120) uh 21,ρ = h2 i=1

j=0

Theorem 4. When the condition g1k+1 = 0

on

Γxh

(121)

is valid, for the system (111), (115), (116)–(119) the a priori estimate uh 1,ρ ≤ wh 1,ρ +

τ k+1 f 1,ρ 4 1

(122)

holds for any grid function a ∈ Hx . 2 Now we consider the problem (9) – (11) for the second component of velocity:

68

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

1 1 1 ∂v2 − νΔv2 + (uτk · ∇)v2 + div(v2 uτk ) = f2 in Ω × (tk , tk+1 ), (123) 2 2 2 ∂t v2 = g2 on Γ × (tk , tk+1 ), (124) τ (125) v2 (x, y, tk ) = u2,k (x, y) in Ω. Realize the further splitting of this fractional step in x- and y-directions: 1 u) ¯ 1 ∂(a¯ ∂2u ¯ 1 ∂u ∂u ¯ = f2 + −ν 2 + a 4 2 ∂x 2 ∂x ∂x ∂t u ¯ = g2 on Γx × (tk , tk+1 ), ¯ u ¯(x, y, tk ) = b(x, y) on Ω;

in

Ω × (tk , tk+1 ),

(126) (127) (128)

and 1 ¯ ¯ 1 ∂(bw) 1 ∂w ∂2w ¯ ∂w ¯ = f2 + −ν 2 + b 4 2 ∂y 2 ∂y ∂y ∂t w ¯ = g2 on Γy × (tk , tk+1 ), ¯ w(x, ¯ y, tk ) = u ¯(x, y, tk+1 ) on Ω.

in

Ω × (tk , tk+1 ], (129) (130) (131)

After solving both problems we get ¯ v2 (x, y, tk+1 ) = w(x, ¯ y, tk+1 ) on Ω.

(132)

The equations (126) and (129) are identical to the equations (75) and (73) respectively, while the problems differ only in right-hand side. Therefore we do not repeat computations for the discretization of two new problems. The difference in the discretization of new problems consists in the use of different (geometrically shifted) subspaces for u and u ¯, w and w. ¯ From the geometric point of view the problems (73)–(74) and (126)–(128) are more contiguous within the change of y to x, b to a. To realize the approach with small fictitious domains we first consider extended domain Ω1 = (0, 1) × (−h/2, 1 + h/2) and prolong the equation (73) by smooth way into additional strips Ω1 \ Ω through boundary Γy using Taylor expansions of functions in left-hand side of (73). Recompute right-hand side of (73) with the help of extended functions in fictitious domains, we get equation (78) to be valid in extended domain Ω1 . With these extensions we obtain the following Galerkin formulation instead of (84): find wh ∈ Hx which satisfies the boundary condition k+1 h = g1,i,j , i = 0, . . . , n, j = 0, n, wi,j

(133)

and the integral relation     h ∂w ¯h 1 ∂v ∂w 1 h ,v b + , (w , v)Ω1 + ν ∂y ∂y ∂y Ω1 2 τ Ω1     1 k+1 1 ∂v 1 , v Ω1 f = (a, v)Ω1 + bwh , − 4 1 τ ∂y Ω1 2

(134)

Completely splitting method for the Navier-Stokes problem

69

for an arbitrary function v ∈ Hx which satisfies the boundary condition vi,j = 0,

i = 0, . . . , n,

j = 0, n.

(135)

h at boundary Under detail consideration we can see that unknowns wi,j+1/2 Γx can be omitted in further algorithmic considerations by use of boundary conditions, for example, k k , b∗i,j = g2,i,j , etc (136) ai,j+1/2 = g1,i,j+1/2

at the boundary nodes. It allows us to solve system of equations (89), (96), (100) and exclude 6 different types of equations (90), (91), (97)–(99), (101). System of algebraic equations becomes more uniform and we shall use this simplification in our numerical experiment. Analogous situation is in the other subproblems.

4 Numerical experiment For numerical experiment we take the problem (1)–(4) with the parameters ν = 0.01, T = 1, and the following data: f1 (x, y, t) = 0, f2 (x, y, t) = 0, on Ω × (0, T ); g1 (x, y, t) = − cos(πx) sin(πy) exp(−2π 2 νt),

on Γ × (0, T ); g2 (x, y, t) = sin(πx) cos(πy) exp(−2π 2 νt), on Γ × (0, T ); u0,1 (x, y) = − cos(πx) sin(πy), u0,2 (x, y) = sin(πx) cos(πy),

on

Ω.

The solution of this problem is u1 (x, y, t) = − cos(πx) sin(πy) exp(−2π 2 νt), u2 (x, y, t) = sin(πx) cos(πy) exp(−2π 2 νt),   p(x, y, t) = − 0.25 cos(2πx) + cos(2πy) exp(−4π 2 νt), ¯ × [0, T ]. The graphs of functions u1 (x, y, t), p(x, y, t) are presented on on Ω Fig. 7, 9 at time t = 1. First we consider the error for pressure. The Fig. 12 demonstrates that the order of convergence is τ 1/2 + h2 in discrete L2 -norm. On Fig. 12 we see the artificial numerical boundary layer specially in corners, which is usual for splitting. Its origin comes from incorrect boundary conditions for pressure of Neumann type. For example, at the point (x, 0) ∈ Γ from initial equation (7) we have 1 1 ∂u2 ∂p + νΔu2 − (u · ∇)u2 − div(u2 u). = f2 − 2 2 ∂t ∂n And on the fractional step of pressure action we get on the base of (19) and previous considerations that for splitting problem we have an equality equivalent to ∂u2 ∂p . = f2 − ∂t ∂n

70

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

u 8.2×10

−1

y

−8.2×10−1

x

Fig. 7. The first component of velocities values

1.02×10−3

y −1.02×10−3

x

Fig. 8. Errors for the first component of velocity

So you see that we have the error of order O(1) in boundary condition. It is a good luck that this discrepancy produces the error in pressure only in narrow boundary layer, which gives small error in L2 -norm and in its discrete analogue.

Completely splitting method for the Navier-Stokes problem

71

p 3.37×10−1

−3.35×10−1

x

Fig. 9. The pressure values

1.53×10−3

−1.08×10−3

x

Fig. 10. Errors for the pressure

Fig. 11 demonstrate the order of convergence τ +h2 for the first component of velocities in discrete L2 -norm. The error for this velocity also has an artificial numerical boundary layers that is demonstrated by Fig. 8. Here origin of artificial boundary layers comes from splitting into geometrical directions and is produced by unsimultaneous use of boundary conditions. For example,

72

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

7.34×10−3

2−3

τ 2−6

1.28×10−4 −6 2−12 2

Fig. 11. The dependence of maximal on t, L2 (Ω)−norm of u−error on h, τ

1.49×10−2

2−3

τ 2−6 4.37×10−4

2−12 2

−6

Fig. 12. The dependence of maximal on t, L2 (Ω)−norm of pressure errors on h, τ

in problem (73)–(74) for u1 we first use boundary condition like u1 = g1 on Γy × [tk , tk+1 ]. It means that u1 on fractional step in y-direction in general does not satisfy boundary condition u1 = g1 on Γx × [tk , tk+1 ] with discrepancy of order τ . Therefore on the next fractional step in k-direction (equations

Completely splitting method for the Navier-Stokes problem

73

(75)–(76) ) we have lim

x→0

∂u1 ∂u (0, y) + O(1). (x, y) = ∂t ∂t

Again it produces approximation error of order O(1) in thin vicinity of Γx that results in artificial boundary layer of amplitude O(τ ). Analogously we get thin artificial boundary layer for u2 in the vicinity of Γy . Of course, these boundary layers with small amplitude give the accuracy corresponding truncation error. But they do not give to use Richardson extrapolation for increase of accuracy order because of irregular character of approximate solution error.

5 Conclusions An important advantage of splitting method is reduction of problem with complete operator into several simpler problems on each time step: Poissonlike problem for pressure and four families of one-dimensional problems (in view of discretization in time). Operator of Poisson-like problem is symmetric and positive-definite, has constant coefficients. It allows to use many effective algorithms to solve the problem. It results in algebraic complexity with number N of arithmetical operations, where N ≈ cn2 is number of unknowns. The main disadvantage of splitting method consists of artificial boundary layers produced by inaccurate boundary conditions. As it was written yet, they have comparatively small amplitude but have irregular character and do not give to increase the accuracy by Richardson extrapolation. Of course, there are several papers (for example, see [12], [19] and references in it) in which amplitude of artificial boundary layers is somewhat reduced because of more accurate work with boundary conditions. Another ways to get the second order of convergence in time consist of Crank-Nicholson approach and Θ-scheme [8], [5], [9]. But in principle, Richardson extrapolation for regular truncation error and stable scheme allows any finite order of convergence, for example, third and fourth. Such a regular truncation error is given by full implicit scheme. Therefore on next stage of our joint work we shall use full implicit scheme to ensure an increase of convergence order at least in τ . The main advantage of staggerred meshes consists of automatic fulfilment of LBB-condition for pressure stability [2]. But last years the other approach is popular enough: filtering the spurious modes. The main idea is to implement united square mesh and bilinear finite element for velocities and piecewise constant for pressure. This scheme becomes stable with orthogonalization of approximate solution to local spurious modes [1], [2], [10]. In principle, this orthogonalization reduces the number of degrees of freedom for pressure from n2 to 3/4 n2 in 2D-problem. For 3D-problem this loss is even less: 7/8 n3

74

I.V. Kireev, U. R¨ ude, and V.V. Shaidurov

instead of n3 [10]. Algebraic complexity due to orthogonalization increases by 2n2 arithmetical operations only. But advantage of united mesh is evident. The coding for united mesh is simpler even in 2D-problem. In the vicinity of curvilinear boundary this reason becomes crucial since staggerred meshes come to multiciphered approximations of domain that is problematic from both theoretical and practical points of view. Therefore on the next stage of our joint work we shall use united mesh with filtering of local spurious model instead of staggerred meshes. It looks that square mesh is more appropriate for our problem. First, in 2D domain the number of squares is twice less than the number of triangles. For 3D domain this ratio is usually between 5 and 6. It produces the greater job with simplex elements. Then, the quadrature formulae are simpler for square than for triangle that is more considerable in 3D elements. But triangles give the better possibilities to approximate a curvilinear boundary. Therefore we shall use at next stage the combination of square mesh in the domain with triangle elements in the thin vicinity of curvilinear boundary. Of course, in the situation with condensed meshes in adaptive approach we get some ”nonconforming approach” from the elemental point of view. But from nodal point of view this approach with dividing square in m2 equal squares is conforming and has no difficulties in theoretical justification and practical assembling.

References 1. Boland J, Nicolaides R (1983) SIAM J Numer Math 20:722—731 2. Brezzi F, Fortin M (1991) Mixed and Hybrid Finite Element Metods. Springer, New York 3. Chorin A (1967) J Comp Phys 2:12–26 4. Chorin A (1969) Math Comp 23:341–353 5. Glowinski R (1987) Le Θ−scheme. In: Bristian M, Glowinski R, Perinx J (eds) Numerical methods for Navier-Stokes equations 6. Hackbusch W (1985) Multigrid Methods and Applications. Springer, Berlin 7. Heywood J, Rannacher R (1982) SIAM J Numer Anal 19:275–311 8. Heywood J, Rannacher R (1990) SIAM J Numer Anal 17:353–384 9. Kloucek P, Rys F (1994) SIAM J Numer Anal 31:1312–1336 10. Mansfield L (1984) Mumer Math 45:165–172 11. Marchuk G, Shaidurov V (1983) Difference Methods and Their Extrapolations. Springer, New York 12. Prohl A (1997) Projection and Quasi-Compressibility Methods for Solving the Incompressible Navier-Stokes Equations. Teubrer, Studtgart 13. Rannacher R (2000) Finite Element Methods for the Incompressible NavierStokes Equations. In: Galdi G, Heywood J, Rannacher R (eds) Fundamental Directions in Mathematical Fluid Mechanics. Birkh¨ auser, Berlin 14. R¨ ude U (1994) Multilevel, extrapolation, and sparse grid methods. In: Proceedings of the Fourth European Conference on Multigrid Methods. Boston

Completely splitting method for the Navier-Stokes problem

75

15. Tirek S (1999) Efficient Solvers for Incompressible Flow Problems. Springer, Berlin Heidelberg 16. Shaidurov V (1995) Multigrid Methods for Finite Elements. Kluwer Academic Publishers, Dordrecht 17. Shen J (1996) Math Comp 65:1039–1065 18. Temam R (1979) Navier-Stokes Equations. Theory and Numerical Analysis. North-Holland Publishing Company, Amsterdam 19. Van Kan (1986) SIAM J Sci Staf Comp 7:870–891

Methods of shock wave calculation V.F. Kuropatenko Russian Federal Nuclear Center, P.O. Box 245, 456770 Snezhinsk, Russia [email protected]

Summary. Certain manipulation with the mass, momentum and energy conservation laws, written in the form of partial differential equations for an ideal non-heatconducting medium, give a corollary saying about entropy conservation along the particle trajectory. Conservation laws on the surface of a strong shock are algebraic equations showing that entropy grows across the shock wave. This is the fundamental difference between a shock wave and a continuous solution. We will discuss only the shock wave methods that treat the strong discontinuity as a layer of a finite width (the shock is smeared within an interval of a finite length called distraction) comparable with the size of the mesh cell. Since states behind and before the shock are related, then there must exist a mechanism that ensures the growth of entropy in the shock distraction region. Only four principally different mechanisms of energy dissipation in the distraction region are known [1]– [4]. Consider four shock wave methods corresponding to these four mechanisms. Many difference schemes can be used to implement them. I suggest that we look only at those that were proposed by the authors of these four methods [1]–[4]. B.L. Rozhdestvensky and N.N. Yanenko [5] were first to try to compare these methods, focusing on approximations and stability. In this presentation I will focus on energy dissipation, shock distraction and monotonicity.

1 Neumann – Richtmyer method The basic idea of the method is that energy dissipation and strong shock distraction occupying several mesh cells are provided by adding an artificial viscosity term to the differential equations of motion and energy [1]. Ref. [1] proposes the artificial viscosity term in the form   C 2 Δx20 ∂U  ∂U  (1) q=− ∂x0  ∂x0  V and offers a difference scheme then slightly modified in [6]. Difference schemes with the artificial viscosity term may differ, as well as expressions for q [7,

78

V.F. Kuropatenko

8]. The difference schemes may be either explicit or implicit. But given the presence of the artificial viscosity term, all such schemes are implementations of the Neumann-Richtmyer method. In the difference scheme proposed in [1], thermodynamic quantities are defined at the centers of mesh intervals for m, and velocities and coordinates are defined in mesh nodes. Equations in [6] are written as: n n n n Pi+0,5 + qi+0,5 − Pi−0,5 − qi−0,5 Uin+1 − Uin = 0, + h Δt

xn − xni xni+1 − xn+1 i , , h = i+1n Vi+0,5 h   n+1 2 n+1 k U − Uin+1 , f or Ui+1 − Uin+1 < 0 n+1 n+1 , qi+0,5 = Vi+0,5 i+1 n+1 0, f or Ui+1 − Uin+1  0   n+1 n   n+1 Pi+0,5 + Pi+0,5 n+1 n+1 n n + qi+0,5 Vi+0,5 = 0, − Vi+0,5 Ei+0,5 − Ei+0,5 + 2  n+1 n+1  n+1 = P Vi+0,5 , Ei+0,5 , Pi+0,5 xn+1 = xni + τ Uin+1 , i

n+1 Vi+0,5 =

(2) (3)

Equations (2) and (3) form a system of non-linear equations for P n+1 and . E The method is conditionally stable. The ratio between time and space steps æ = aτ /h depends on an empirical constant, k, and according to [6], the actual stability condition is n+1

æ

0, 25.

Ref. [1] proposes a method of shock distraction analysis. For this purpose they add the artificial viscosity term, q, in form (1) and go to a self-similar variable ξ = m − W t. This yields

W V  + U  = 0, 



W U − (P + q) = 0, 



E + (P + q)V = 0, where priming means differentiation with respect to ξ. For the ideal gas P V = (γ − 1)E

(4) (5) (6)

(7)

and q taken in the following form: q=

k 2 h2 W 2 2 (V  ) , V

(8)

Methods of shock wave calculation

79

equations (4)-(8) reduce to the single equation for V 2k 2 h2



dV dξ

2

2

+ (γ + 1) (V − V0 ) + 2V0 (V − V0 ) = 0.

(9)

Its solution is  ξ = ±kh

  V 2 . arcsin γ − (γ + 1) V0 γ+1  2 3khπ and for V = V1 , respectively, For V = V0 , ξ = ξ0 = γ+1 2    V1 2 . arcsin γ − (γ + 1) ξ = ξ1 = −kh V0 γ+1 γ−1 V0 is achieved across the infinite γ + 1 2 khπ . So, the width of the shock shock with P0 = 0. In this case ξ1 = − γ+1 2 layer, Δξ, and the strong shock distraction, D, in the Neumann-Richtmyer method are:   2 Δξ 2 . = 2kπ , DNR = Δξ = ξ0 − ξ1 = 2khπ γ+1 h γ+1 The maximum compression V1 =

e The effective distraction, DNR , is determined by finding points where the straight line V (ξ) with the maximum slope

Vm (ξ) =

V  0 kh 2 (γ + 1)

intersects with V0 and V1 Δξ =

V0 − V 1 .  VM

 and the minimum specific volume Substituting the expression for VM

V1 =

γ−1 V0 , γ+1

and dividing by h yield

 e DNR

= 2k

2 . γ+1

(10)

80

V.F. Kuropatenko

2 Lax method The basic idea of this method [2] is that energy dissipation is provided by the principal terms of approximation errors. Later this method was called the approximation viscosity method. Difference equations are obtained by integrating the conservation laws over the mesh cell and applying the mean-value theorem: n+1 n − Vi+0,5 Vi+0,5 U ∗ − Ui∗ = 0, − i+1 h τ

(11)

n+1 n Ui+0,5 − Ui+0,5 P ∗ − Pi∗ = 0, + i+1 h τ

(12)

∗ ∗ n εn+1 (P U )i+1 − (P U )i i+0,5 − εi+0,5 = 0, + h τ  n+1 2 n+1 Ei+0,5 = εn+1 , i+0,5 − 0, 5 Ui+0,5

(13) (14)

n n n n where the values of sought functions Vi+0,5 , Pi+0,5 , Ui+0,5 , Ei+0,5 , and εni+0,5 n are defined at the centers of mesh intervals for m at times t , and auxiliary quantities Pi∗ , Ui∗ and (P U )∗i are defined at the centers of the time steps, τ , at the faces of the mesh cells with coordinates mi . Equations (11)-(14) are general until equations for Ui∗ , Pi∗ and (P U )∗i are specified. Ref. [2] proposes a difference scheme that defines auxiliary quantities U ∗ and P ∗ across shocks and continuous solutions with the following equations:

Ui∗ =

  h  n 1 n n n , Vi+0,5 − Vi−0,5 + Ui+0,5 + Ui−0,5 2τ 2

(15)

  h  n 1 n n n , (16) Ui+0,5 − Ui−0,5 − + Pi−0,5 P 2τ 2 i+0,5   h  n 1 n n ∗ ε (17) − εni−0,5 . (P U )i+0,5 + (P U )i−0,5 − (P U )i = 2τ i+0,5 2 Difference equations (11)-(13) and equations (15)-(17) for the auxiliary quantities approximate the differential conservation laws with approximation errors Pi∗ =

ω1 = −

  1 ∂ 2 V h2 1 ∂2V + O τ 2 , h2 , τ + 2 ∂m2 τ 2 ∂t2

(18)

  1 ∂ 2 U h2 1 ∂2U + O τ 2 , h2 , (19) τ + 2 ∂m2 τ 2 ∂t2   1 ∂ 2 ε h2 1 ∂2ε + O τ 2 , h2 . (20) τ + ω3 = − 2 ∂m2 τ 2 ∂t2 When h → 0 and τ = const, the associated terms in (18)-(20) tend to zero. However, it goes worse with τ . When τ →0, the terms proportional to ω2 = −

Methods of shock wave calculation

81

h2 h2 = 0. If not, equations (11)-(13) do in (18)-(20) tend to zero if only lim τ →0 τ τ h→0 not converge to the initial differential equations because the reduction of τ at constant h increases the error. According to [9], the equation of entropy production for difference schemes with independent ω1 , ω2 , and ω3 is T

∂S = ω3 − U ω 2 + P ω 1 . ∂t

(21)

Substitute Eqs. (18)-(20) into Eq. (21) and using differential equations replace the second time derivatives in Eq. (20) by m-derivatives. Also assume ∂S ≈ 0 and then the entropy production equation takes the form: that ∂m    2  2    ∂U ∂V h 1 − æ2 ∂S 2 + ... + a = T ∂t ∂t æ 2a ∂t τa → 0, the rate of entropy production approaches infinity. So, the h difference scheme of Lax is extremely dissipative, according to [8]. Consider the distraction of a stationary discontinuity in the Lax method. For this end write difference equations (11)-(13) in the differential form with approximation errors (18)-(20) and go to the variable ξ = m − W t. We obtain For æ =

WV  + U +

   h2  1 − æ2 V  + O τ 2 , h2 = 0, 2τ

   h2  1 − æ2 U  + O τ 2 , h2 = 0, 2τ    h2   1 − æ2 ε + O τ 2 , h2 = 0. W ε − (P U ) + 2τ Integrate these equations with respect to ξ. Find constants of integration for ξ = +∞, where U = U0 , V = V0 , P = P0 , E = E0 , ε = 21 U02 + E0 , V  = 0, U  = 0, P  = 0, ε = 0. This yields   W V + U + AV  − W V0 − U0 + O τ 2 , h2 = 0,   W U − W U0 − P + P0 + AU  + O τ 2 , h2 = 0,   W ε − W ε0 − P U + Aε + P0 U0 + O τ 2 , h2 = 0, (22) WU − P +

 h2  1 − æ2 . Substitute the Clapeyron equation into (22). Then 2τ express all quantities in terms of V and derivatives in terms of V  . We obtain an ordinary differential equation for the profile V (ξ) where A =

  (V0 − V ) (V − V1 ) dV 4AV = O τ 2 , h2 , − V W (γ + 1) dξ

(23)

82

V.F. Kuropatenko

 2  a0 2 γ−1 . Omitting the second order infinites+ γ+1 γ+1 W imals gives the following solution:   2h2 1 − æ2 (V1 ln (V − V1 ) − V0 ln (V0 − V )) . (24) ξ= τ W (γ + 1) (V0 − V1 ) 

where V1 = V0

It follows from (24) that ξ = ξ0 = +∞ for V = V0 and ξ = ξ1 = −∞ for V = V1 . So, the strong shock distraction in the Lax method is infinite: DL = ∞. To determine the effective distraction, differentiate (23), and find VM and  for V  = 0 the maximum value VM VM =

 V0 V ,

 VM =

 2 (γ + 1) æ  V1 . V − 0 2h (1 − æ2 )

(25)

Using (23) and (10) yields DLe

  √ √  2 1 − æ2 V + V √ 0 √ 1 . = (γ + 1) æ V0 − V1

(26)

It is seen from (25) that DLe → 0 for æ → 1 and DLe → ∞ for æ → 0 or V1 → V 0 . Finally, check monotonicity of the Lax scheme. Go from P and U to invariants: α = P + aU, β = P − aU. Express P and U in terms of α and β: P = 0, 5(α + β), For a matter with EOS

U = 0, 5 (α − β) /a.

P = a2 (V0 − V ) ,

(27) (28)

replace V by P in Eq. (11). We obtain n+1 = Pi+0.5

  1 τ a2  n 1 n n n Ui+1.5 − Ui−0.5 . Pi+1.5 + Pi−0.5 − 2 h 2

(29)

Substituting (27) in (29) and (12) yields n+1 n+1 n αi+0.5 + βi+0.5 = 0, 5 · αi−0.5 (1 + æ) + n n n +0, 5 · αi+1.5 (1 − æ) + 0, 5 · βi−0.5 (1 − æ) + 0, 5 · βi+1.5 (1 + æ) , n+1 αi+0.5

+0, 5 ·

n αi+1.5



n+1 βi+0.5

(1 − æ) − 0, 5 ·

= 0, 5 ·

n βi−0.5

n αi−0.5

(30)

(1 + æ) +

n (1 − æ) − 0, 5 · βi+1.5 (1 + æ) .

(31)

Methods of shock wave calculation

83

Sum (2) and (2), and then subtract (2) from (2) ⎫ n+1 n n = 0, 5 (1 − æ) αi+1.5 + 0, 5 (1 + æ) αi−0.5 ,⎬ αi+0.5 n+1 n n βi+0.5 = 0, 5 (1 + æ) βi+1.5 + 0, 5 (1 − æ) βi−0.5 .



(32)

It follows from (32) that for 0 æ 1, all coefficients of the invariants in the right-hand sides are nonnegative and hence the difference scheme by Lax is monotonic by the Godunov theorem.

3 Godunov method In this method all quantities that characterize the response of media to loads are defined at the centers of mesh intervals for m. Coordinates xi are defined in mesh nodes. The difference equations are written in forms (11)-(13). Auxiliary quantities Pi∗ , Ui∗ are defined as follows. All tabular functions at time tn are assumed piecewise constant. Therefore, arbitrary discontinuities appear in nodes. They split at t > tn . Pressures and velocities across the contact discontinuity are taken to be auxiliary quantities. If an arbitrary discontinuity is such as a shock wave propagates to the right of xi and a rarefaction wave does to the left, then equations for the quantities across the contact discontinuity are n n + ani−0.5 Ui−0.5 , Pi∗ + ani−0.5 Ui∗ = Pi−0.5 n n − Wi+0.5 Ui+0.5 . Pi∗ − Wi+0.5 Ui∗ = Pi+0.5

Generally Wi+0.5 depends on Pi∗ and Ui∗ because the problem of discontinuity splitting is non-linear. However, for a weak shock with Wi+0.5 = a + O(h), ai−0.5 = a + O(h), equations for Pi∗ , Ui∗ take the form   n   n n n − 0, 5a Ui+0.5 , (33) + Pi−0.5 − Ui−0.5 Pi∗ = 0, 5 Pi+0.5     n n n n Ui∗ = 0, 5 Ui+0.5 − 0, 5 Pi+0.5 a. (34) + Ui−0.5 − Pi−0.5 Write difference equations (11)-(13), (33) and (34) in the differential form. The approximation errors ω1 , ω2 , and ω3 are: ω1 = −

  h ∂2P τ ∂2V + O τ 2 , h2 , − 2a ∂m2 2 ∂t2

  ah ∂ 2 U τ ∂2U + O τ 2 , h2 , − 2 ∂m2 2 ∂t2 2  2    h ∂2P h ∂P τ ∂ 2 ε ah ∂ 2 U ah ∂U +O τ 2 , h2 . P + + + U − ω3 = − 2 2 2 2a ∂m 2a ∂m 2 ∂m 2 ∂m 2 ∂t ω2 = −

Since ω1 , ω2 , and ω3 are independent, then, by [9], the right-hand side of the entropy equation is in form (21). Substitute ω1 , ω2 , and ω3 in (21). Then using

84

V.F. Kuropatenko

differential conservation laws and their derivatives, replace time derivatives by m-derivatives. This gives the following equation of entropy production:  2   2   ∂U ∂P h ∂S 2 (35) + O τ 2 , h2 . +W (1 − æ) = T ∂m ∂m 2W ∂t τa < 1, this difference scheme, being It follows from (35) that for æ = h an acoustic approximation to the Godunov scheme, is extremely dissipative. Since the principal term in the right-hand side of Eq. (35) is nonnegative, entropy grows across both shock and rarefaction waves. The rate of entropy production is limited and achieves maximum at æ = 0:  2   2 ∂U ∂P h ∂S 2 . +W < T ∂m ∂m 2W ∂t Analyze shock distraction. For this end go to the self-similar variable ξ = m − W t and write the difference equations in the differential form: WV  + U −

  h  τ W 2  P + O τ 2 , h2 = 0, V − 2W 2

  τ W 2  hW  U + O τ 2 , h2 = 0, U + 2 2 2   h τ W  hW    (P P  ) + O τ 2 , h2 = 0. (U U  ) + ε + W ε − (P U ) − 2W 2 2 Integrating with respect to ξ and eliminating P , U , ε, P  , U  , ε gives the following equation for V (ξ) for the ideal gas: WU − P −

  (V − V0 ) (V − V1 ) 2h (1 − æ) dV + O τ 2 , h2 = 0. + · V dξ (γ + 1) Its solution is ξ=

2h (1 − æ) (V1 ln (V − V1 ) − V0 ln (V0 − V )) . (γ + 1) (V0 − V1 )

From this equation: ξ = ξ0 = +∞ for V = V0 , ξ = ξ1 = −∞ for V = V1 , So, in the Godunov method, the shock distraction for æ < 1 is infinite: DG = ∞, and for æ = 1, DG =0.

Methods of shock wave calculation

85

The effective distraction is obtained in the same manner as in the Lax method: √  √ V0 + V1 2 e √ . (1 − æ) √ DG = (γ + 1) V0 − V 1 To check the Godunov scheme for monotonicity, go to the invariants. Express P and U in terms of α and β, and for equation of state (28), replace V by P in Eq. (11). For a = W , we obtain   n n+1 n+1 n n n n n αi+0.5 , (36) +βi+0.5 = αi+0.5 +βi+0.5 +æ βi+1.5 − αi+0.5 − βi+0.5 + αi−0.5   n n+1 n+1 n n n n n . αi+0.5 − βi+0.5 = αi+0.5 − βi+0.5 + æ −βi+1.5 − αi+0.5 + βi+0.5 + αi−0.5 (37) Summing (36) and (37), and subtracting (37) from (36) give equations for α and β: n+1 n n = αi+0.5 (1 − æ) + αi−0.5 æ, αi+0.5 n+1 n n βi+0.5 = βi+0.5 (1 − æ) + βi−0.5 æ.

For 0  æ  1, all coefficients of α and β are nonnegative and by the Godunov theorem, the difference scheme, being an acoustic approximation of the Godunov scheme, is monotonic.

4 Kuropatenko method [4] The basic idea of this method is as follows. All mesh intervals (basic and auxiliary) are referred to one of two types depending on solution: compression or rarefaction. The former is treated as shock compression defined by the local (only within the current interval) shock wave. States before and behind the shock wave relate as conservation laws: P1 − P0 − W (U1 − U0 ) = 0,

(38)

U1 − U0 + W (V1 − V0 ) = 0,

(39)

W 2 (U − U02 ) = 0. (40) 2 1 The state before the shock (P0 , V0 , E0 , U0 ) is the solution in the mesh interval. One of the quantities, either on the boundary or in the neighbor interval, is taken as the quantity behind the shock. Other quantities behind the shock are determined from Eqs. (38)-(40) and the equation of state. They are taken as auxiliary quantities. For example, if define U1 [4], then P1 , V1 , E1 , and W are sought from Eqs. (38)-(40), or if define P1 [10,11,12], then V1 , E1 , U1 , and W are sought. The method can be implemented on different meshes [4], [9]–[14]. Discuss two of them. P1 U1 − P0 U0 − W (E1 − E0 ) −

86

V.F. Kuropatenko

4.1 Non-divergent scheme Meshes proposed in [4] for velocity and thermodynamic quantities differ. Quantities P , V , and E are defined at the centers of mass intervals, and velocities are defined in nodes tn , mi . For a compression wave, the difference equations take the form: n

n

P − P i−0.5 Uin+1 − Uin = 0, + i+0.5 h τ

(41)

xn+1 = xni + τ Uin+1 , i

(42)

n+1 xn+1 xn − xni i+1 − xi , h = i+1n Vi+0,5 h   n+1   n n+1 n+1 n n = 0. − Vi+0.5 Ei+0.5 − Ei+0.5 + 0, 5 P i+0.5 + P i+0.5 Vi+0.5 n+1 Vi+0,5 =

(43) (44)

The dynamic pressure P is a solution of these equations across the strong shock. Before the shock, we take quantities in the mesh interval at time tn n n n V0 = Vi+0,5 , P0 = Pi+0,5 , E0 = Ei+0,5 ,

and as the velocity jump we take the difference of U in nodes at time tn+1   n+1 ΔU = |U1 − U0 | = Ui+1 − Uin+1 . Substituting these quantities in the equations for the strong shock yields   n+1 n+1 n P i+0.5 = Pi+0.5 − W Ui+1 − Uin+1 ,

(45)

where W depends on P0 , V0 , E0 and ΔU . For a simple equation of state for condensed matter 2 (ρ − ρ0k ), P = (γ − 1)ρE + C0k

Eq. (45) takes the form n+1

n P i+0,5 = Pi+0,5 + bΔU 2 +



2  2 (bΔU 2 ) + ani+0,5 ΔU 2 ,

(46)

n where b = γ+1 4 ρi+0,5 . Eq. (46) has two asymptotes:

1. Weak shock, bΔU  ani+0,5 . In this case the dynamic pressure is a linear function of ΔU : n+1 n P i+0,5 ≈ Pi+0,5 + ani+0,5 ΔU. (47)

Methods of shock wave calculation

87

2. Strong shock, bΔU  ani+0,5 . In this case the function is quadratic: n+1

n P i+0,5 ≈ Pi+0,5 +

γ+1 n ρi+0,5 ΔU 2 . 2

(48)

Using these asymptotes, M. Wilkins [8] introduced a linear-quadratic artificial viscosity. Taking Taylor series expansion of all quantities in Eqs. (41)-(44) gives independent approximation errors: ω2 = −

  ∂2P ∂2U τ ∂2U + O τ 2 , h2 , +τ + hW 2 2 ∂t∂m ∂m 2 ∂t

(49)

  τ 2 ∂2U τ ∂U (50) + O τ3 , − 2 6 ∂t 2 ∂t   h2 ∂ 3 x (51) + O h3 , ω5 = − 24 ∂m3   2   ∂V ∂U ∂2V ∂P ∂V τ ∂ E + O τ 2 , h2 , τ h . (52) + hW + P − ω7 = − ∂t ∂m ∂t2 ∂t ∂t 2 ∂t2 ω4 =

Differentiate (42) and (43) with respect to t and m, and using the equation ∂U ∂P = ω10 + a2 ∂m ∂t write ω7 as

2     a  ∂V + O τ 2 , h2 . ω7 = hW 1 − æ ∂t W

Since ω7 is independent, then the entropy production equation for W = a + O(τ, h) takes the form:  T

∂S ∂t



 = hW (1 − æ) m

∂V ∂t

2

+ O(τ 2 , h2 ).

What about distraction in this non-divergent scheme? As earlier, go to the self-similar variable ξ = m − W t. The differential conservation laws with approximation errors (50), (51), (49) and (52) are WU − P −

  τ W 2  U + hW U  − τ W P  + O τ 2 , h2 = 0, 2   τW  U + O τ 2 = 0, 2    x − V + O τ 2 = 0,

W x + U −

E + P V  −

(53) (54) (55)



τW (E  − P  V  + P V  ) − hW V  U  + O τ 2 , h 2

 2

= 0.

(56)

88

V.F. Kuropatenko

By differentiating (53)-(56), eliminating x , E  , U  , P  and integrating with respect to ξ we obtain a differential equation for V (ξ) that is identical with the equation in the Godunov scheme. Thus, the first differential approximation of the Kuropatenko non-divergent scheme has the same distraction that the approximation of the Godunov scheme. Is the scheme monotonic? For equation of state (28) across the compression wave, we write the consequence of Eqs. (41)-(44) as n+1 n − Pi+0,5 + Pi+0,5

τ a2 n+1 (Ui+1 − Uin+1 ) = 0. h

Substituting (45) in (41) yields τ n−1 n−1 n n − Pi−0,5 ) − æ(Ui+1 − 2Uin + Ui−1 ) = 0. Uin+1 − Uin + (Pi+0,5 h

(57)

(58)

Substitute (27) in (57) and (58) n+1 n+1 n+1 n+1 n n + βi+0,5 + æ(αi+1 − βi+1 ) − æ(αin+1 − βin+1 ) = αi+0,5 + βi+0,5 , (59) αi+0,5 n−1 n−1 n−1 n−1 + βi+0,5 ) + æ(αi−0,5 + βi−0,5 )+ αin+1 − βin+1 = αin − βin − æ(αi+0,5 n n n n +æ(αi+1 − βi+1 ) − 2æ(αin − βin ) + æ(αi−1 − βi−1 ).

(60)

Write Eq. (60) for i + 1 and multiply by −æ, then multiply Eq. (60) by æ, and add all to Eq. (59). For β=const, we obtain n+1 n n n n αi+0,5 = αi+0,5 + (3æ2 − æ)(αi+1 − αin ) − æ2 (αi+2 − αi−1 )+ n−1 n−1 n−1 +æ2 αi+1,5 − 2æ2 αi+0,5 + æ2 αi−0,5 .

Take the Taylor series expansions of all α in the right-hand side. We obtain the following equation:  2    ∂ α ∂α n+1 2 2 n + O(h3 ). (61) +æ h αi+0,5 = αi+0,5 − æh ∂m2 i+0,5 ∂m i+0,5 Decrease the index by 1 and subtract from (61). Then take the Taylor series expansions at tn and mi of all quantities in the right-hand side of the obtained equation. This gives   2   ∂ α ∂α n+1 n+1 n+1 + O(h3 ). (62) − æh Δi = αi+0,5 − αi−0,5 = h ∂m2 i ∂m i For β=const, the compression wave propagates in the positive direction. Since on the backside of the compression wave α  0, α  0, then for τ ≈ 0 (æ ≈ 0), it follows from (62) that Δni  0. In order that Δni remain nonpositive, it is required that the following condition be satisfied   2      ∂α   − τ a ∂ α   0.   ∂m2   ∂m  So, the scheme is conditionally monotonic.

Methods of shock wave calculation

89

4.2 Divergent scheme [10] All thermodynamic quantities and velocities are defined at the centers of mesh intervals and mesh nodes have coordinates tn and mi . The difference equations are in form (11)-(14). To define auxiliary quantities Pi∗ , Ui∗ , the solution in the auxiliary interval mi−0.5  m  mi+0.5 is divided in two: rarefaction and compression. Compression wave. Auxiliary quantities are found from equations (38)-(40) for the strong shock surface. Quantities across the discontinuity are defined as follows. n n − Ui−0.5 < 0, then If Ui+0.5 n n n n > Pi+0.5 , 1. U1 = Ui−0.5 , (P, V, E, U )0 = (P, V, E, U )i+0.5 for Pi−0.5 n n n n 2. U1 = Ui+0.5 , (P, V, E, U )0 = (P, V, E, U )i−0.5 for Pi−0.5 < Pi+0.5 . All other quantities subscripted 1 are found from (38)-(40). If consider only W¿0, then Pi∗ , Ui∗ are defined by equations   n n n n . (63) Ui∗ = Ui−0.5 , Pi∗ = Pi+0.5 − W Ui+0.5 − Ui−0.5 Check monotonicity of this scheme across the compression wave. Constitutive equations with auxiliary quantities (63) take the form:  τ a2  n n , Ui+0.5 − Ui−0.5 h   n τ  n n+1 n n n n . − Pi+0.5 − a Ui+1.5 − 2Ui+0.5 + Ui−0.5 P Ui+0.5 = Ui+0.5 − h i+1.5 Replace P and U by their expressions for the invariants α and β   n n+1 n+1 n n n n n , + βi+0.5 = αi+0.5 + βi+0.5 − æ αi+0.5 − βi+0.5 − αi−0.5 + βi−0.5 αi+0.5 n+1 n Pi+0.5 = Pi+0.5 −

  n n+1 n+1 n n n n n αi+0.5 − βi+0.5 = αi+0.5 − βi+0.5 − æ αi+1.5 + βi+1.5 − αi+0.5 − βi+0.5 +  n  n n n n n + æ αi+1.5 − βi+1.5 . − 2αi+0.5 + 2βi+0.5 + αi−0.5 − βi−0.5 Sum these equations n+1 n n n n n αi+0.5 = αi+0.5 (1 − æ) + æαi−0.5 − æβi+1.5 + 4æβi+0.5 − æβi−0.5 .

(64)

If β=const, Eq. (64) takes the form n+1 n n αi+0.5 = αi+0.5 (1 − æ) + αi−0.5 æ.

Both coefficients are positive for 0 æ 1 and hence the divergent scheme [10], [12] is monotonic across the compression wave. Now consider shock distraction. For this end write difference conservation laws (11)-(14) and auxiliary quantities (63) in the differential form with approximation errors:

90

V.F. Kuropatenko

ω1 = −

  h ∂2U τ ∂2V + O τ 2 , h2 , − 2 2 2 ∂m 2 ∂t

  h ∂2P ∂2U τ ∂2U + O τ 2 , h2 , − − hW 2 2 2 2 ∂m ∂m 2 ∂t       2 ∂U ∂ ∂U h ∂ ∂P τ∂ ε h ∂ + U + hW P + U − ω3 = − ∂m ∂m ∂m 2 ∂m ∂m 2 ∂m 2 ∂t2   +O τ 2 , h2 . ω2 = −

Go to the self-similar variable ξ = m − W t. Then the equations take the form   τ W 2  h  (65) V − U + O τ 2 , h2 = 0, WV  + U − 2 2   τ W 2  h  (66) U − P + hW U  + O τ 2 , h2 = 0, WU − P − 2 2   h h τW     P U  − (U P  ) + (P U  ) − hW (U U  ) + O τ 2 , h2 = 0. W ε − (P U ) − 2 2 2 (67) Integrating with respect to ξ gives WV + U −

  τW2  h  V − U = W V0 + U0 + O τ 2 , h2 , 2 2

  τW2  h  U − P + hW U  = W0 U0 − P0 + O τ 2 , h2 , 2 2 h h τW  (P U ) − U P  + P U  − hW U U  = Wε − PU − 2 2 2  2 2 = W ε0 − P0 U0 + O τ , h .

WU − P −

(68) (69)

(70)

Using (65)-(67), replace U  and P  in (68)-(4.2) by V  . Then using (68)-(4.2), replace U and P by V . We obtain an equation describing the profile V (ξ) for the ideal gas. The equation is identical to that in the Godunov scheme. Therefore, the distraction and the effective distraction in this scheme are e identical with DG and DG .

5 Other difference schemes 5.1 Lax-Wendroff scheme The scheme of Lax and Wendroff [15], [16] is worthy of considering because of its rather wide use. Lax and Wendroff proposed that auxiliary quantities Pi∗ , Ui∗ in (11)-(13) should be defined as Pi∗ = Pin −

 n  B n τ 2 n n n − Ui−0,5 ), − ai+0,5 − ani−0,5 (Ui+0,5 − Ui−0,5 (ani ) Ui+0,5 4 2h

Methods of shock wave calculation

91

 n   B  n τ  n n n ai+0.5 − ani−0.5  Pi+0,5 , − Pi−0,5 − Pi+0,5 − Pi−0,5 2 n 2h 4 (ai )     n  ∗ n 2 n n n +(ani ) Uin Ui+0,5 × (P U )i = (P U )i − Pin Pi+0.5 − Pi−0.5 − Ui−0,5  τ  B  n n  a − ai−0,5 , + × 2h 4 (ani )2 i+0,5

Ui∗ = Uin −

where

  1 n 1 n n ai+0,5 + ani−0,5 , , ani = Pi+0,5 + Pi−0,5 2 2     1 1 n n n n n (P U )i+0,5 + (P U )i−0,5 . , (P U )i = + Ui−0,5 Ui+0,5 Uin = 2 2 Using these equations for shock wave computing is the same as adding three artificial viscosity terms:     B h2  ∂a  ∂P B 2  ∂a  ∂U , , qu = − qp = − h  4 a2  ∂m  ∂m ∂m  ∂m 4    ∂P B h2  ∂a  2 ∂U . +a U P qpu = − ∂m ∂m 4 a2  ∂m  Pin =

They are not approximation viscosities and therefore, the Lax-Wendroff scheme is an implementation of the Neumann-Richtmyer method. This scheme has an empirical constant, B ≈ 1 − 2, defining the boundary of the stability region. The stability condition is 1 æ(æ + B)  1. 2 The scheme is non-monotonic. 5.2 Eulerian difference schemes These difference schemes are widely used in aerodynamic calculations. In rather detail their merits and shortcomings are considered in [17], [18]. The only thing I would like to attract your attention to is that all these schemes can be considered as consisting of two steps. At the first step the mesh is Lagrangian and one of the shock wave methods in the Lagrangian formulation is used. During the second step the quantities are recalculated to transfer from the Lagrangian mesh to the Eulerian one. The solution obtained at the first step permits the approximation of mass, momentum and energy fluxes acting across Eulerian cell faces without disturbing the conservation laws.

92

V.F. Kuropatenko

5.3 Non-monotony reduction Obtained solutions can be made monotonic by using special methods that allow their smoothing without disturbing the conservation laws. These methods can be used along with any of the above shock wave methods. As a rule, these methods are developed without considering the problems of energy dissipation and entropy conservation across continuous solutions.

6 Conclusion In conclusion I would like you to look at this table and compare the basic parameters of the methods we have just discussed. Difference Schemes Parameter

NeumannRichtmyer 

Lax

∞ 1 Distraction, 2 2kπ D γ+1 2 Effective  √ √ 2(1 − æ2 ) V 0 + V1 2 distraction, 2k √ √ γ + 1 æ (γ + 1) V 0 − V1 De Yes No 3 Monotonicity No k 4 Empirical constants √ γ æ1 5 Stability æ 2k

Kuropatenko DiverGodunov Nondivergent gent ∞ ∞ ∞ √ √ 2(1 − æ) V0 + V1 √ √ (γ + 1) V0 − V1 Yes CondiYes tional No No No æ1

æ1

æ1

References 1. 2. 3. 4. 5.

Neumann J, Richtmyer R (1950) J Appl Phys 21,3:232–237 Lax P (1954) Commun Pure Appl Math 7:159–193 Godunov S (1959) Collect Math Papers 47(89),3:271–306 Kuropatenko V (1960) Transactions USSR Ac Sci 3,4:771 Rozhdestvensky B, Yanenko N (1968) Systems of quasi-linear equations and their gas dynamic applications. Nauka, Moscow 6. Richtmyer R, Morton K (1972) Difference methods for initial-value problems. Mir, Moscow 7. Samarsky A, Arsenin V (1961) J Comput Math Math Physics 1,2:357–380 8. Wilkins M (1967) Calculation of elastic-plastic flows. In: Computational Methods in Hydrodynamics. Mir, Moscow

Methods of shock wave calculation

93

9. Kuropatenko V (1985) J Comput Math Math Physics Moscow 25,8:1176–1188 10. Kuropatenko V (1963) J Comput Math Math Physics 3,1:201–204 11. Kuropatenko V (1966) Difference methods for hydrodynamic equations. In: Proceedings of Steklov Institute of Mathematics, Moscow 12. Kuropatenko V, Makeyeva I (1997) VNIITF Preprint 120. 13. Kuropatenko V (1962) High School News, Mathematics 3(28):75–83 14. Kuropatenko V (1967) Transactions Siberian Branch USSR Ac Sci 3:81–82 15. Lax P, Wendroff B (1960) Commun Pure Appl Math 13:217 16. Lax P, Wendroff B (1964) Commun Pure Appl Math 17:381 17. Belotserkovsky O, Davydov Y (1971) J Comput Math Math Physics 11,1:182– 207 18. Belotserkovsky O, Davydov Y (1982) A ”coarse particle” method of gas dynamics. Numerical Experiment/ Nauka, Moscow

Distributed and collaborative visualization of simulation results U. Lang High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, Allmandring 30, 70550, Stuttgart, Germany [email protected]

Summary. The visualization group of the High Performance Computing Center Stuttgart (HLRS) has developed a distributed software environment, that allows to visualize simulation results either on a desktop computer or on stereo projection environments. It supports the coupling of ongoing simulations with visualization thus enabling simulation steering. In addition it is possible for multiple engineers or scientists at different locations to discuss the same visualizations and interact with them. The software architecture was designed to make efficient use of distributed computing resources as well as high speed networking infrastructures. The software architecture will be explained together with results of projects in which it was used.

1 Introduction Scientific Visualization is a support technology that enables scientists and engineers to understand complex relationships typically represented by large amounts of data. The visualization process chain is a part of the overall simulation process chain. Its elements and their interrelationship represent the characteristics of scientific visualization and its usage in different application fields. By combining visualization techniques datasets can be analyzed and simulation models can be explored. Additionally engineers can judge complex geometries and use visualization to communicate complex content and support decision processes. Virtual reality techniques can further improve this perception process. With the rapid advances in hardware technologies the data volumes resulting from measurement and computing devices increase very fast. Data as intermediate carrier of information can not be immediately understood by humans. Visualization is the process to convert different forms of information into a visual representation, thus allowing humans to recognize states, structures and behaviour. The term scientific visualization was introduced in 1987 ([1]). Since then visualization is evolving as an own discipline which has been structured into scientific and information visualization. While information visualization focuses on the visual representation of non-spatially structured

96

U. Lang

information, scientific visualization is mainly oriented towards the visualization of data being defined on multi dimensional domains. Visualizing data distributed in 3D space enables humans to make use of their evolutionary developed capabilities to discern structures at certain locations or see spatial transitions in structures. This accelerates the comprehension of complex structures or enables it at all. The spatial recognition capabilities are complemented by further capabilities such as the recognition of movements as well as the relationship between movements. To make use of these capabilities higher dimensional datasets need to be mapped in three dimensional space while dynamics in content need to be scaled to human perceivable time scales. Many engineering disciplines focus on the development of products that are mainly characterized by their physical shape like cars, buildings, satellites or bridges. In addition their behaviour and properties is of importance. Bridges need to be stable, cars should have low fuel consumption and need to be safe, buildings should be energy conserving, etc.. While the visual appearance of such objects can be directly visualized, the behaviour and properties have to be mapped into visual representations that can be easily combined with the geometric representation of objects and thus understood in their spatial allocation. As most properties don’t have a visual representation in reality a visual metaphor has to be applied that allows an intuitive understanding. This is further complicated if the relationship between different parameters should be understood.

2 The Visualization Process Chain

Fig. 1. The visualization process chain

The visualization process chain, shown in figure 1, starts with the source process, which either generates or reads data. Instead of a simulation it could also be a measurement process or the reading of data that have been produced earlier. Examples of measurement-based data are satellite born images or computer tomography datasets in medicine while simulation data could be a temperature or velocity field defined on a spatial grid.

Distributed and collaborative visualization of simulation results

97

The filter process, either selects or samples data, corrects measurement errors or produces deducted information. It is used to focus the attention on certain spatial or data value domains and allows to speed up the processing by working only on smaller samples of the overall datasets. In fluid flow simulations the vorticity is an example of derived data that is computed from the basic simulation parameters. The mapping step represents the core of the visualization process. The selected visualization method converts data into abstract visual representations. There is a multitude of different visualization algorithms that implement different types of mappings, each of them having their specific capabilities. In most cases the mapping leads to a collection of geometric primitives such as triangle lists, line lists, point clouds, etc. This is combined with textures and material properties of surfaces. Figures 2 and 3 show two example visualizations of a fluid flow field. In figure 2 particle paths visualize the velocity field of water flowing through a water power plant. Figure 3 shows the colouring of turbine blade surfaces due to pressure distribution, vector representations for the velocity field as well as an isosurface of enthalpy in the flow field of a water vapour turbine.

Fig. 2. Visualization of water flow using particle paths (IHS, University of Stuttgart)

In the rendering step the scene descriptions together with lighting information and camera positions are used to generate images of the scene which are then displayed. The rendering step can be conceptually separated from the display step. In the display step series of images can be collected and viewed in fast sequence thus appearing as a continuously animated representation of the selected content. In a virtual reality environment the rendering and display steps are typically combined into one step to speed up the processing and reduce the reaction time on human interactions.

98

U. Lang

Fig. 3. Visualization of vapor state (ITSM, University of Stuttgart)

Volume rendering as a special visualization method conceptually integrates the mapping and rendering steps. Volume rendering bypasses the geometric representations between the mapping and the rendering step. Its input is a scalar field defined on a three-dimensional grid which is interpreted as a semitransparent medium. Via transfer functions for transparency and colouring a mapping of the scalar values in each volume element (Voxel) is performed. These semitransparent coloured voxels are then superimposed to form an image of the overall volume. Multiple algorithms exist to define transfer functions and to accumulate the voxels. Aims are to detect subtle structures and reduce the processing time. Figure 4 shows internal structures of a metallic motor block. The data has been acquired via computer tomography. In figure 5 the bone and skin of the visible human [2] is shown while all other materials are rendered transparently. 2.1 Interaction and feedback The visualization process is used for different purposes, accordingly one differentiates between exploratory visualization and presentation visualization. In exploratory visualization the scientist or engineer does not know before hand the structures or the behaviour of a system that is simulated. Therefore the visualization toolkit needs to support an incremental exploratory process

Distributed and collaborative visualization of simulation results

99

Fig. 4. Volume rendering of material structure in an engine (GE)

Fig. 5. Volume rendering of Visible Human Dataset

to search for structures or behaviour. Such a process has a highly interactive character. In the visualization process chain the interaction activities are separated into multiple feedback loops. The innermost loop feeding back into the render-

100

U. Lang

ing step allows modifying the observer position and orientation thus enabling a free roaming in the 3D scene. The modification of camera parameters additionally allows zooming into specific details of the scene. As many simulations produce time dependent results it is of equal importance to understand the dynamic behaviour of a system. The human perception requires that the dynamic behaviour is shown in an appropriate time scale. To allow this the scene content for the different time steps need to be stored in memory enabling a quick switching between them thus giving an observer the impression of a smooth change in structure respectively a smooth movement of objects. An exploratory visualization process is characterized by a repetitive display of the dynamic behaviour while changing viewing parameters. An interaction concept similar to a video recorder is required to slow down the animation speed, step through time or reverse the orientation of the animation time. In the next outer feedback loop a user can interact with the parameters of a selected mapper. Realistic process chains typically consist of multiple filters and mappers as can be seen in figure 7. Typical interactions with mappers are e.g. the repositioning of a cutting plane, the definition of new starting positions for particle traces in a flow field or the definition of a new isovalue of an isosurface. Such types of interactions are applicable to the 3D visualization of figure 6. Thus a user can locate a specific region where a certain effect appears or an unusual behaviour is determined. The next outer loop allows interactions with the filtering steps. Filter parameters enable to select subdomains of a region together with the values defined on this subdomain. In a search process for interesting effects the location of such a subdomain is consecutively moved across the computational domain. An alternative approach to reduce the volume of data to be processed is to sample it down. This speeds up the processing and thus raises the interactivity. When a peculiar behaviour is located in a certain spatial region, the sampling can be removed and a fully detailed extraction of a region of interest can be applied. Finally the feedback into the input of the simulation step is called simulation steering. Here a user can see the simulation results change as the simulation evolves. With the immediate feedback the user can modify boundary conditions and with a certain delay see how the behaviour of the simulated system changes. Also here the time scale of the human perceptual system is of importance. Changes need to happen within seconds to be perceived as a dynamic behaviour. ¿From the inner to the outer loops the timing requirements for the reaction of the system become less demanding. When moving through a scene or interacting with other animation parameters the system should ideally react within a 1/30 of a second. This requires an image update rate of at least 30 frames/s, which limits the complexity of the scene. Modifying mapper parameters can already take longer, especially if they need to be applied to a whole sequence of time dependent data. In such a case a new isosurface would e.g. have to be recalculated for all time steps.

Distributed and collaborative visualization of simulation results

101

Fig. 6. Climate simulation in a car cabin (data courtesy DaimlerChrysler Research)

Finally large-scale simulations can take hours or even days to finish. Waiting to see the modified behaviour of such a system after changing an input parameter seems to be rather inappropriate. Instead, the visualization system collects the result data as it appears over time and keeps it for a repetitive analysis. Thus the user can repeatedly look into the time dependent system behaviour. This enables him to interrupt the simulation very early when he recognizes an erroneous system behaviour. Alternatively he can provide other more appropriate input parameters.

3 COVISE and other scientific visualization packages Since the introduction of scientific visualization multiple modular visualization packages have been developed that have strong commonalities but also clearly differentiate from each other. A common approach is the description of the visualization task via a data flow network paradigm. This reflects the concept of the visualization process chain. A visual program editor allows configuring the topological relationship of the processing steps graphically. Exchanged data is depicted as edges of a graph connecting the processing steps. Figure 7 shows COVISE [3] as an examples of such a package, which has been developed at the High Performance Computing Center Stuttgart (HLRS). OpenDX [4] is

102

U. Lang

another package with similar characteristics for desktop usage which has been developed by IBM and released in the meantime as open source software. Further packages like Khoros, AVS or NAG Explorer also applied dataflow networking paradigms. Most of the packages allow executing a visualization process chain distributed across multiple machines in a computer network. Compared to other software integration platforms, visualization packages typically have to be optimized for interactive work. There are different and complementing approaches to achieve this. OpenDX e.g. implements modules as subroutines in one executable. While this reduces the execution overhead it prevents the parallel and independent execution of modules within the same machine. Systems such as AVS, Khoros, COVISE and NAG Explorer implement the processing steps as separate processes that allow a high flexibility in the distribution of processes across different machines. While there is some execution overhead on the same machine the operating system automatically executes multiple modules in parallel. On multi processor machines this leads to an efficient handling of modules without additional effort of the user. Large data flow networks can consist of more than hundred modules. To maintain an overview, many packages allow to collapse a whole set of modules into one macro and visually represent it by one icon. Data flow networks can either be optimized for memory usage or execution speed. For high interactivity the later is the preferred option. Therefore most packages typically cache data objects that represent the intermediate state in the process chain, which can lead to a large memory overhead. This is further aggravated if time dependent data is processes by a system. But in case of complex filter steps that need not be repeated the delay for any type of interaction in an exploratory visualization is strongly reduced. OpenDX further refined this concept by allowing a selective activation of this caching mechanism for different points in a data flow network. When knowing which methods will be reused for parameter changes, one can enable the caching of the input to this module thus avoiding that the data has to be processed again from the beginning. COVISE is optimized to make efficient use of the high performance networking infrastructure of a typical high performance computing center by adapting buffer sizes, using asynchronous communication and assembler routines for data conversion between different machine platforms. Most visualization packages focus on the visualization step assuming, that the simulation has been performed before. COVISE treats the coupling of visualization with an ongoing simulation as equally important. Therefore a specific communication library was implemented that allows an efficient coupling to ongoing simulations on remote supercomputers. Within the visual program editor such a remote simulation appears as a module and does not differentiate in its handling. COVISE can be seen as one example of a visualization system architecture. Figure 8 shows the core processes as well as the modules and how they interrelate. When a COVISE session is initiated, Mapeditor and Controller are started. Optionally an additional remote user can be invited to partici-

Distributed and collaborative visualization of simulation results

103

Fig. 7. Screen snapshots of modular visualization packages

pate in the session. If accepted, a further Mapeditor is initiated on the remote machine. Data manager processes are started on all participating machines. Supercomputers are added as further machines without a Mapeditor. The Mapeditor enables a user to establish and configure the module network he wants to execute. As soon as he brings up an icon representing a module the same icon also appears on the workstation screen of a collaboration partner. Additionally the process is started on the respective platform and switches into an event wait. Then the user connects the modules to define an execution sequence and the data exchange. Data flow networks are typically stored after having been set-up. Thus they can be loaded again when they have to be reused. When a data flow network is executed the controller sends messages to the modules in the topological sequence. The messages contain the names of files to be read, the names of data objects to be created or accessed as well as parameter values to be used by the module. Names of data objects are then sent to the data manager on the same machine together with characteristic sizes. This enables the data manager to allocate memory for the data object and pass a pointer back to the module to access it. When existing data objects need to be accessed, a pointer is immediately passed back. On machines with virtual shared memory data objects are mapped into the virtual address space of the modules thus avoiding the copying of large data objects. If data objects

104

U. Lang

Fig. 8. COVISE software architecture

produced on one machine need to be accessed on another machine they are copied by the data managers from machine to machine. This provides a flexible environment that can make efficient use of distributed computing and networking hardware including remote high performance computers local departmental servers and desktop workstations respectively local virtual reality hardware. 3.1 3D modeling and usage of texturing In the engineering sciences the modeling of object and part geometries is a process typically handled by CAD packages. Depending on the application field different CAD tools exist. The geometry forms the basis for the simulation of the physical behaviour of an object or part. Properties of interest could be the stiffness of a part, its thermal behaviour, its deformation when external forces are applied or its fluid flow behaviour. To simulate the behaviour a grid has to be defined which allows discretizing the domain of the physical behaviour. Based on additional initial and boundary conditions the calculation determines the time dependent behaviour of the respective object or part. To visualize the behaviour the shape of the part or object needs to be displayed at the same time. In figure 6 this is e.g. the cut open part of the bounding geometry of a car cabin. Whereas for mechanical engineering mostly different colours on the model surfaces are sufficient, architectural representations depend on the visual rep-

Distributed and collaborative visualization of simulation results

105

resentation of surfaces like concrete, wood or plaster. As the atmosphere of a visualization mainly depends on the mapping, one has to put a focus on it. To apply textures as well as to reduce the number of polygons to allow decent frame rates, modeling and animation tools like 3D Studio MAX are used. As most of the CAD and modeling packages defined their own proprietary file format a common exchange format is required to pass the geometry on to the visualization package. VRML97 has evolved during the last years as this common file format describes 3D geometry and behaviour of models for the internet. A VRML/VRML97 importer with extended capabilities is integrated into the COVISE renderer COVER to import VRML97 models to combine the imported geometry and mappings with the visualization of measurement or simulation data. VRML97 supports interaction and animation that greatly assist the users in immersing into the scene.

4 Virtual reality techniques for 3D visualization A virtual reality impression as described here is produced by a combination of technologies that give a user the feeling to be immersed in a computer generated scene. It is important to cover a large viewing angle, as the peripheral view is an essential element of the human perception for having the impression of being inside the virtual world. This is best accomplished by setting up a CAVE like environment as shown in figure 9 consisting of at least 3 stereoscopic projection walls and a stereo projection floor. To support this immersive impression the displayed world needs to react immediately to movements of the observer and allow direct interaction with the scene content. A user should be able to grab objects, move them around and perform other interactions directly and intuitively that fit to the displayed content. In the example of the climate simulation in the car cabin the scientist should e.g. be able to insert new particles in the air flow and see the paths they follow appear immediately. Figure 9 shows GIS data of the Zurich area in Switzerland layered over the terrain model. In figure 10 two scientists discuss the temperature distribution inside a car cabin produced by a previous simulation. Depending on the scenario further sensory information can be very supportive. For a medical specialist force feedback is essential during the training of an operation. For an architect or urban planner the auditory information within a larger building or street strongly improves the sensation of being there. In the real world objects can be moved with constraints, doors can be opened, etc. Users expect the same behaviour of objects like in a real world. Therefore time dependent event driven animations of objects are an essential element of a virtual environment. This e.g. allows calling an elevator by pushing a button, which opens and closes doors and caries users to different levels of a building.

106

U. Lang

Fig. 9. Immersive virtual environment used for GIS terrain visualization

5 Collaborative working Many of the visualization packages have been extended toward collaborative working, allowing users at different locations to discuss visualizations as if they would be in one room looking on one workstation screen. While such an addon functionality is often difficult to be integrated it has been a principle design concept of COVISE form its inception. Thus it is also inherently available in the extensions that have been added later to COVISE. In COVISE collaborative working is also applied to the steering of ongoing simulations which is still a highly interactive process that requires fast turn around times of multiple feedback loops. It is assumed, that not only the simulation is performed on a remote high performance computer but also post processing steps up to the rendering of images might be executed in a distributed environment. The design of the distributed software architecture has a strong influence on its characteristic behaviour regarding time delays on interactions, responsiveness in a collaboration process but also scalability with increasing volumes of data.

Distributed and collaborative visualization of simulation results

107

Fig. 10. Immersive virtual environment used for CFD visualization

5.1 Reaction times of the rendering feedback loop The highest demand on the reaction time is given when visualizing simulation results in a virtual reality environment such as a CAVE. When a user moves, the whole scene content has to be redrawn from the perspective of the new viewer position with at least 10 to 15 updates per second. In case of a remote rendering the new viewer position first has to be transmitted to the rendering side where the new image is generated, compressed, transmitted back to the viewing station, decompressed and finally displayed. Just taking the communication delays as well as the compression and decompression times into account, without considering the rendering times, these already exceed the required turn around time. Therefore typical distributed virtual environments work with local scene graphs using local graphics hardware for rendering. For collaboration in a distributed virtual environment the positions of participants are sent out. In a local scene display the other participants are represented by avatars. Thus it is barely noticeable if a delay in updating an avatar position appears. When using a desktop workstation for visualizing the content the requirement on maximum delay until the scene is re-rendered from a new perspective is less demanding than in a virtual environment. At least 3 to 5 frames per

108

U. Lang

second should be reached with one frame delay to react on scene interactions. In a collaborative session it is expected that all participants share the same viewer position providing the same content as a basis for discussion. A variation of one frame does not influence a discussion process, while multiple frames difference in the discussed visual content might lead to misunderstanding and thus result in unusable working conditions. Taking large volumes of time dependent simulation data into account as well as larger variations in networking bandwidth and delays for different participants in a collaborative session such differences in currently visualized content become very likely. Therefore a group collaboration environment for high end simulation steering needs to handle such synchronization issues. 5.2 Reaction times of the postprocessing feedback loop The collaborative analysis of a simulation in an explorative process requires modifying parameters of a visualization tool such as a cutting plane position or apply different tools in the evolving exploration. Collaborating partners always need to have the same state of information about the overall system and need to be able to change roles, i.e. actively steering the exploration process or passively watching but participating in the discussion. The delays until a parameter change in a visualization tool leads to an updated scene content can vary strongly and be in the range of parts of a second to multiple seconds. The more stringent requirement here is, that the update takes place at the same time at the different participating sites of a discussion. With a local feedback loop involving the generation of a new cutting plane and rendering it, depending on the interaction in a virtual environment, it is possible to have 15 or more frames per second with modified content. In a collaborative environment such scene update rates are only possible if the generation of the new content is done locally and only synchronization information such as the parameter set for the cutting plane determination is exchanged. 5.3 Reaction times of the simulation feedback loop The still acceptable delay on the modification of simulation parameters is defined by the time a human being is able to stay mentally in the model world of the simulation without noticing any reaction or activity of the system. Experiments showed that people can tolerate delays of up to a minute while waiting for new simulation results. This tolerance can even be increased if intermediate results like from an iterative solver are displayed in-between. For outstanding actions that don’t show an effect over tenth of seconds or more the scientist needs a visual reminder that there are still ongoing activities such as an hourglass icon for the cursor or an indicator for the remaining time to wait. Also here it is required for a collaborative session that the modified visual content appears synchronously to prevent discussion on inconsistent content.

Distributed and collaborative visualization of simulation results

109

6 Collaborative virtual prototyping environment Combining all the elements described above allows to present engineering design and development processes using the concept of virtual prototypes. Such virtual prototypes integrate geometry and behaviour in a computer representation while allowing a user to interact with them as if they would be real. Engineers can change parameters of the air conditioning of a future car, as if they were in the real car. With online simulation coupling the model reacts on changes of the dashboard opening. Users need not only focus on the layout and optimization of products but can furthermore concentrate on the development of usage concepts and profiles such as of buildings or cars. Layout concepts orient the focus of interest on certain areas of a technical product. They e.g. avoid distraction of a driver from a road while providing complementary information. In a virtual environment it can be shown how to guide humans through buildings or how to make users feel comfortable there. Mostly combined approaches are used with maps and elements representing design concepts as well as animations explaining the concept. 6.1 Scaling ranges of 3D visualizations The model size in a 3D visualization ranges from very small like an air vent of a few millimeters diameter to models of urban or landscape size of several kilometers. Although a CAVE has a limited size of approximately 2.5 to 3 meters side length, all dimensions can be represented. Due to the reach of the human body the models are often scaled during interactions and then rescaled again. While the engineers scale up the vent to a size of a few meters to see single particles, the architects rescale their 1:1 model to architecture model scale, to make changes and then rescale to 1:1 again to judge the changes. 6.2 Virtual prototype for car climate layout In the framework of the European Community funded project VISiT (Virtual intuitive simulation testbed) multiple virtual prototyping scenarios from different European companies have been implemented and evaluated. The climate optimization for future cars of DaimlerChrysler was such a scenario. In a CAVE it became possible to enter a virtual car cabin, interact with the dashboard openings to change the amount and direction of inflowing air as well as the temperature (see figure 11), insert new openings at the dashboard as well as in the legroom and also interact with them. Additionally seats could be moved and different types of drivers and passengers could be selected to provide realistic variations of the climatisation conditions. All modifications of and interactions with the model were immediately provided to simulation codes running in the background. Within minutes engineers could see the modified behaviour of the virtual prototype and thus

110

U. Lang

Fig. 11. Virtual prototype of a car climate layout

optimize it. Additionally discussions of specialist about the behaviour of the prototype are supported and therefore held with much more intensity. To implement this concept all simulation tools had to be hidden from the user and were only accessible to him via a virtual reality user interface. The user interface as well as the simulation and support tools were all implemented in COVISE as software integration platform (see figure 12). 6.3 Virtual auto house DaimlerChrysler is also promoting the application of interactive visualizations in architecture. To design a new generation of auto houses virtual reality has been used from the very beginning of developing the general building concept to the final projects. Team meetings with many disciplines are held in the CAVE to discuss about the architecture and its impact in 1:1 scale. Architects, brand managers, sales specialists, event designers, marketing specialists, artists, simulation experts (e.g. airflow, temperature) and even potential customers discuss in the 1:1 project representations. Figure 13 on the left shows particle paths of a climate simulation visualizing the air flow in a planned auto house. On the right the temperature distribution above the ground floor of the building is shown.

Distributed and collaborative visualization of simulation results

111

Fig. 12. Simulation tools integration for virtual prototyping

Fig. 13. DaimlerChrysler Auto house in a Virtual Environment

This way of working is part of the communication concept ”MarkenStudio” [5] in which visualization plays a major role. It could be observed, that as soon as there is ”something to look at” and ideas are being visualized, it is much easier to achieve a commonly agreed meeting result. The participants are more willing to change their ”point of view” and to understand and accept the ideas of the others more easily. Additionally virtual reality assists in reaching a high degree of planning safety at an early stage. To allow different users appropriate interaction with the model, different interaction methods like colour-picker (changing the colour of walls / floors / ceiling interactively), texture-picker (changing texture on the fly to judge the right material), exhibition-designer (creating, placing and modifying exhibitions interactively) or switching through variations have been implemented.

112

U. Lang

As understanding takes a certain time, it became apparent that many ideas could be communicated much better with animations.

References 1. McCormick B, DeFanti T, Brown M (1987) Comput Graphics 6:1–14 2. The Visible Human Project (2003) http://www.nlm.nih.gov/research/visible/visible human.html 3. Rantzau D, Frank K, Lang U, Rainer D, W¨ ossner U (1998) COVISE in the CUBE: an environment for analyzing large and complex simulation data. In: Proceedings of the 2nd Workshop on Immersive Projection Technology 4. http://www.opendx.org/ 5. Drosdol J, Kieferle J, Wierse A, W¨ ossner U (2003) Interdisciplinary cooperation in the development of customer-oriented brand architecture. In: Proceedings of Trends in Landscape Modeling, Dessau

Safety problems of technical objects V.V. Moskvichev Institute of Computational Modeling of SB RAS, Academgorodok, 660036 Krasnoyarsk, Russia [email protected]

Summary. This study presents a review of research in reliability and safety of technical systems carried out over the period of 1990-2002 at the Department for Machine Science of ICM SB RAS. The following subjects are considered: 1) analysis of failure causes of complex technical systems (CTS) in various industries and types of their limiting states (primary, additional, emergency); 2) methods of checking calculation on the fracture toughness; 3) computational algorithms and technologies on life-cycle design of welded structures; 4) parameters of residual life time assessment, reliability and risk-analysis of CTS. The basic data for the calculations were obtained from numerous tests on the fracture toughness and analysis of technological and operational defects of CTS. Developed methods were applied in calculations for a variety of structural applications including building, crane and ship structures, welded joints of a reactor and excavators, propeller blades of airplanes, frame structures of spaceships, pressure vessels, and pipeline systems.

1 Introduction Modern development of technics and technology is characterized by high rate of growth and scientific achievements in aerospace, nuclear, energy, chemical and other industries. At the same time this gives rise to initiation and intensification of non-existent previously potential and real threats to human society and environment from technical objects [1, 2]. Failure of complex technical systems and engineering structures is the main source of man-caused disasters. This fact has led to developing new methods to design and analyze CTS. In 1970-1980, further to conventional methods of calculating the strength, durability and reliability of structural components, new approaches for assessment of the fracture toughness have been implemented. Later on parameters of residual life, risk and safety have been included in design procedure (fig. 1). Quantitative assessment of these parameters requires numerous experimental and calculation efforts to study the causes and mechanisms of failure, to formulate the limit states and to model the emergency conditions.

114

V.V. Moskvichev

Fig. 1. Development of calculation methods for technical systems

2 Failures and limit states At present the character of Russian industry lies in a high physical deterioration and obsolescence of the basic production assets running up to 60..80%. The situation is worsen by a high portion (up to 50%) of accidents due to “human factor” in the system “man-machine-environment” (fig. 2). Against this background the role of technological and constructional factors initiating emergency situations is intensified. Failure analysis of quarry machines and excavators, cranes, pressure vessels, heat-and-power equipment, technological and trunk pipelines shows that technological defects, fatigue and corrosion cracks, low quality of metals, residual stress and aggressive environment are the main causes of fracture (fig. 3). Specific weight mentioned above factors depends on type of CTS. In most cases, failure of CTS relates to initial technological defects in welded joints. Statistical analysis of weld defects typical for various industries has allowed determining the distribution functions of the defect types and sizes corresponding to different technologies of welding. These distributions have been found to be determined predominately by manufacturing methods rather than the type of a structure. To make engineering estimates on the safe life of structures considering initial defects and operational damages distinctive types of the limit states have been marked out (fig. 4). In general, the constitutive equations for the limit states include parameters of stress-strain state σ (e), defect size l, characteristics of the static (Kc , Jc , Kec ) and cyclic (C, n) fracture toughness of materials: Φ{σ, e, l, Kc , Jc, Kce , C, n} = 0.

(1)

Calculations according to the equation (1) can be carried out using either deterministic or probabilistic approaches. In the former case, quantitative assessments of the fracture toughness and residual life of CTS are carried out applying experimental and calculation methods of the fracture mechanics. In probabilistic approach, the methods developed are based on a combination of

Safety problems of technical objects

115

the fracture mechanics criteria and the reliability theory. Since the parameters of the equation (1) are stochastic variables, the possibility to reach the limit state of a system within a given service time t can be estimated by a probabilistic measure – the risk function R(t): ⎧ ⎫ ⎨  ⎬ λΦ (t)dt , Rf (t) = P {Φ (x, t) = 0} = 1 − exp − (2) ⎩ ⎭ Φ

t

where λΦ is the intensity of occurrence of a given limit state.

Fig. 2. Pattern of accident causes in the system “man-machine-environment”

The safe life of a structure is determined as an average time T required for the structure to reach a given limit state: ∞ t[1 − Rf (t)]dt.

T =

(3)

0

Elaboration of this methodology opens up possibilities to analyze and solve problems of CTS safety, to develop and implement new methods of CTS risk assessment [3].

3 Structural materials Evaluation of limit state parameters is impossible without comprehensive knowledge on mechanical properties of materials. For this purpose, a large number of tests on static and cyclic fracture toughness of various structural materials have been carried out [4]. Effect of different parameters, such as scale factor, loading scheme and operational conditions, on characteristics of elasto-plastic fracture of low-carbon and low alloyed steels was of the most interest (fig. 5). Distribution functions for the critical value of J-integral Jc and

116

V.V. Moskvichev

Fig. 3. Fracture causes of technical systems

Fig. 4. Limit states of technical systems

Safety problems of technical objects

117

strain intensity factor Kce under elasto-plastic deformation of the materials have been obtained. Determined relationship between these measures of the material crack resistance facilitates the analysis of CTS limit states. Large experimental database combined with new method base on J-integral is powerful tool for the fracture toughness calculations of structural components. Statistical investigations on the static and cyclic fracture toughness of different zone of welds have allowed to derive the probability distribution functions for wide used structural steels: St3, 09G2S, 10XCHD and others (fig. 7). Based on this study the generalization of cyclic fracture diagrams for low-carbon and low-alloyed steels has been completed. Characteristics of the fracture toughness for structural materials under conditions of dynamic crack growth have been studied employing developed technique. Research done on AL/B composite and ceramics, apart from detailed structural and mechanistic insights, has provided the knowledge on the resistance to crack growth of these materials, which are finding increasing application in many fields of advanced engineering. Amongst others, the obtained experimental results on the fracture behavior of aluminum alloys and clad steels are to mention.

Fig. 5. Research on characteristics of elasto-plastic fracture

4 Development of calculation methods Complex application of fracture mechanics methods has allowed to solve a number of important problems of strength and reliability assessment of CTS

118

V.V. Moskvichev

Fig. 6. Limit state analysis for elasto-plastic fracture

structures. Presented on fig. 6 numerical calculations and experimental data displays the validity of the J-design curve approach used for the fracture toughness calculations of plate structural components with stress concentrators. Strength assessment for load bearing components of space apparatus

Fig. 7. The fracture toughness of welded joints

Safety problems of technical objects

119

made of fibrous metal matrix composite has been done assuming initial technological defects (fig. 8) [5]. To solve problems of the service life design of welded joints a method of statistical testing has been developed. The method is based on application the defect kinetic equations coupled with numerical calculations of stress-strain state and probabilistic models of defects, loading and the fracture toughness (fig. 9). This allows to obtain: - reliability functions of welded joints in the presence of different types of weld defects; - probabilistic diagrams of the residual life linking the number of loading cycles, loading level and the probability of safe life; - probabilistic diagrams of the service life allowing to estimate the influence of loading level, defect size, component thickness and operational temperature. Results of deterministic and probabilistic modeling of the crack growth kinetics have led to the development of a method to evaluate the safe residual life of CTS structures in the form of the equation (3) [6, 7].

Fig. 8. Failure modeling and the fracture toughness of space apparatus frame

Evolution of the probabilistic fracture mechanics approaches and the reliability theory has provided with a tool for the probabilistic risk analysis of structures including estimation of the risk function (2) [3]. The algorithm includes (fig. 10): - analysis of stress-strain state of a structure; - modeling of a potential fracture zone considering a possible limit state; - calculation of the fracture probability.

120

V.V. Moskvichev

Fig. 9. Design of welded structures

In solving problems of the risk analysis, an important role belongs to mathematical modeling and numerical experiment [8, 9]. Mentioned above

Fig. 10. Risk analysis of technical system structures

Safety problems of technical objects

121

Fig. 11. Reliability and risk assessment of welded components of VVER-1000 reactor

methods and techniques have been implemented evaluating parameters of reliability and risk of fracture for components of VVER-1000 reactor (fig. 11), welded metal structures of quarry excavators, high lift-capacity cranes, different types of pressure vessels, piping systems, components of power equipment and building metal structures. At present, our research is concentrated on assessing the safe residual life of machines and structures, which are out of its normative lifetime.

References 1. (1998) Safety of Russia. Functioning and development of complex national economic, technical, energy, transport systems, systems of communication. Znanie, Moscou 2. Moskvichev VV (2002) Fundamentals of structural strength of technical systems and engineering structures. Nauka, Novosibirsk 3. Lepikhin AM, Makhutov NA, Moskvichev VV, Cherniaev AP (2003) Probabilistic risk analysis of technical system structures. Nauka, Novosibirsk 4. Moskvichev VV, Makhutov NA, Cherniaev AP et al (2002) The fracture toughness and mechanical properties of structural materials. Nauka, Novosibirsk 5. Burov AE, Kokcharov II, Moskvichev VV (2003) Failure modeling and the fracture toughness of fibrous metal matrix composites. Nauka, Novosibirsk 6. Lepikhin AM, Makhutov NA, Moskvichev VV, Doronin SV (2000) Fatigue Fract Eng Mater Struct 23:395–401

122

V.V. Moskvichev

7. Lepikhin A, Doronin S, Moskvichev V (1998) Theoret Appl Fract Mech 29:103– 107 8. Shokin YuI, Moskvichev VV (1999) Comput Techn 4:100–110 9. Shokin YuI, Moskvichev VV (2002) Comput Techn 3:271–273

Direct numerical simulations of shock-boundary layer interaction at M a = 6 A.Pagella1 and U. Rist2 1 2

Institute of Aerodynamics and Gasdynamics, University of Stuttgart, Pfaffenwaldring 21, 70550 Stuttgart, Germany [email protected] Institute of Aerodynamics and Gasdynamics, University of Stuttgart, Pfaffenwaldring 21, 70550 Stuttgart, Germany [email protected]

Summary. Two boundary layers with impinging shock wave at Ma = 6, T∞ = 78K and a shock angle with respect to the wall of σ = 12o are compared: a boundary layer with insulated wall and a cooled case with Tw = 300K. As expected, the length of the separation bubble is smaller for the case with cooled wall. Linear stability calculations show, that the first instability mode could be completely stabilized by wall cooling in the underlying case. However, it is known that cooling destabilizes higher, acoustic modes, which is the case here, too. An oblique breakdown scenario reveals the formation of longitudinal vortices in both cases with shock, mainly promoted by the non-linear growth of the (0, 2) mode. The maximum disturbance amplitudes are larger for the case with insulated wall and the disturbance parameters chosen. The structure of the (0, 2) mode is different in the cases with shock, compared to the boundary layer without shock. The wall-normal velocity component v in the base flow of the boundary layer without shock counteracts the formation of longitudinal vortices in the total flow.

1 Introduction In practice, hypersonic flow-situations mainly occur during re-entry into the earth atmosphere. Re-entry is one of the most critical situations during a space-flight mission. A failure of structure is difficult to handle, possibly leading to a total loss of the vessel. Therefore, a profound knowledge of the physics is absolutely necessary. Hypersonic flow is defined at Mach numbers of four to five and higher. There are three main physical effects to be considered: Real gas effects. During the re-entry trajectory at a certain flight level a space vessel encounters very high temperatures, which do not allow to consider an ideal gas within this particular flow regime. If such high-temperature flows are intended to be investigated, real gas behaviour has to be modelled and can not be neglected. In the underlying work, these real gas effects are not taken into account. We limit ourselves to cold flows below ≈ 2500K, the borderline to dissociation at standard conditions, with a smaller Mach-number

124

A.Pagella and U. Rist

at the lower end of the hypersonic regime. At this flow conditions, a real-gas assumption is justified. Taking real gas effects into account certainly is subject to further studies. Shock boundary layer interaction. In flows faster than the speed of sound, a change in direction always results in either a compression or expansion, depending on the direction of the turnaround. Compression waves can merge into a shock, which in turn is very likely to hit a boundary layer on the structure of the craft. In fact this so-called shock-boundary layer interaction is a major source of high heat or pressure loads and often causes separation of the flow. In hypersonic flows, these loads can become very high. Shock-boundary layer interactions were studied from the mid 1940’s. First systematic experimental studies have been carried out by Ackeret et al. [1] and Liepmann [2]. Due to the pressure rise, an impinging shock-wave causes the boundary layer to thicken. It penetrates into the boundary layer ending at the sonic line as an almost vertical shock. There it is reflected as a system of expansion waves. Provided the pressure gradient is strong enough, the boundary layer separates. The thickening results in a deflection of the flow yielding compression waves near separation and reattachment. Well outside the boundary layer, they coalesce to the separation and reattachment shock, respectively. A more thorough description of shock-boundary layer interactions can be found in [4]. Transition to turbulence. Transition from a laminar to a turbulent flow comprises high aerodynamic loads, as well. It has been a major area of concern in the past decades and a lot of research has been carried out on the aspects of understanding and possibly influencing transition. However, although a lot of progress has been achieved the physics are far from being understood. For compressible flows, such as hypersonic flows, much less has been done compared to incompressible flows. For the first phase of the transition process, quantitative predictions can be made with compressible linear stability theory, which was formulated by Mack [7]. Eißler & Bestek [15] and Fezer & Kloker [11] investigated transition to turbulence of flat-plate boundary layers at Mach numbers ranging from about four to six. Experiments with controlled, artificial disturbances in hypersonic flows are very difficult to perform. Therefore, only few are known [18], [51].

2 Numerical Scheme 2.1 Governing Equations The numerical scheme is based on the complete, three-dimensional, unsteady, compressible Navier-Stokes equations for Cartesian coordinates in conservative formulation: ∂ρ + ∇ · (ρu) = 0, (1) ∂t 1 ∂(ρu) ∇ · σ, (2) + ∇ · (ρuu) + ∇p = Re ∂t

Direct numerical simulations of shock-boundary layer interaction

125

1 1 ∂(ρe) ∇ · (σu) , ∇ · (ϑ∇T ) + + ∇ · (p + ρe)u = 2 Re ∂t (κ − 1)RePr Ma where

  2 T σ = μ (∇u + ∇u ) − (∇ · u)I 3

with the velocity vector u = [u, v, w]T . The energy is calculated as  1 e = cv dT + (u2 + v 2 + w2 ). 2

(3)

(4)

The fluid is a non-reacting, ideal gas with constant Prandtl number Pr = 0.71 and specific heat ratio κ = cp /cv = 1.4, with cp and cv as the specific heat coefficients at constant pressure and volume, respectively. Viscosity μ for temperatures above the Sutherland temperature Ts is calculated by Sutherland’s law, for temperatures below Ts with the relation μ/μ∞ = T /T∞ . The thermal conductivity coefficient ϑ is proportional to the viscosity. In our simulations, all lengths are made non-dimensional with a reference length L, which appears 5 in the global Reynolds number √ Re = ρ∞ · u∞ · L/μ∞ = 10 . A local Reynolds number is defined as Rx = x · Re. The specific heat cv is normalized with u2∞ /T∞ (with T∞ giving the free-stream temperature) and time t is normalized with L/u∞ , where u∞ is the free-stream velocity. Density ρ, temperature T and viscosity μ are standardized by their respective free-stream values. Figure 1 shows the integration domain. The calculation starts at X0 , the end of the integration domain is given by XN . Xs gives the location of the shock, which is prescribed at the free-stream boundary. A buffer domain [33] can be switched on at X3 damping the disturbances in order to provide an undisturbed, laminar flow at the outflow boundary. The disturbance strip is located between X1 ≤ x ≤ X2 . The disturbances are periodic in spanwise direction, having a wavelength of λz and determining the width of the integration domain as zN = λz . 2.2 Discretization For a more thorough description of the numerical scheme the reader is referred to [25] and [9]. Time integration is performed at equidistant time steps with a standard Runge-Kutta scheme of fourth-order accuracy (see for example [46]). In streamwise direction, compact finite differences of sixth-order accuracy are applied, which are in a split-type formulation in order to have some damping properties with regard to small-scale numerical oscillations ([8]), which occur at the high gradients resulting from the shock. In the split-type formulation, the weighting of the numerical stencil alternates each Runge-Kutta step from downwind to upwind and vice versa. Near the boundaries, differences of fourth and second order of magnitude are applied ensuring the stencils remain within

126

A.Pagella and U. Rist

Fig. 1. Integration domain

the integration domain. If a stronger shock is present, the damping characteristic of the split-type formulation is not sufficient enough in the two-dimensional base-flow calculation. In this case, an implicit filter of fourth-order accuracy ([10]) is applied to filter the variables of the solution vector each physical time step in streamwise direction: b c αfˆi−1 + fˆi + αfˆi+1 = afi + (fi+2 + fi−2 ) + (fi+1 + fi−1 ) , 2 2 with

(5)

1 1 1 (6) (5 + 6α), b = (1 + 2α), c = − (1 − 2α). 8 2 8 α is the filtering parameter. α = 0.5 would mean no filtering, while α = 0.495 is typically used in our simulations. Aware of the possible influence of the filter on the calculations, particular focus has been taken to the gridindependency of our simulations. It turned out, that with sufficiently small step-sizes no influence of the filter could be observed. Both for the filtering and the streamwise derivatives, the resulting tri-diagonal system of equations is solved by a Thomas algorithm (see e.g. [44]). In wall-normal direction split-type finite differences of fourth-order accuracy are used to calculate convective terms, while viscous terms are calculated by fourth-order central differences. As for the streamwise derivatives, the finite differences at the boundaries are adapted to fit in the integration domain, while keeping the formal order of accuracy here. In spanwise direction we have periodic boundaries, which allow to apply a spectral approximation with Fourier expansion (see e.g. [49]). Transformation to Fourier and physical space is performed with a standard fast-Fourier transform, such as described in [50]. a=

Direct numerical simulations of shock-boundary layer interaction

127

2.3 Boundary Conditions At the free-stream boundary, a characteristic boundary condition ([26]), where the flow variables are held constant along the characteristic     v 1 ∂y (7) + tan−1 = tan sin−1 u Ma ∂x + and, more recently, a non-reflecting boundary condition, according to Thompson [31] is applied. The basic idea in this non-reflecting boundary condition is to neglect parabolic terms in the wall-normal derivatives of the Navier-Stokes equations thus obtaining a hyperbolic problem, which has to be converted into characteristic formulation. Then, incoming characteristics are set to zero. The shock wave is introduced by holding the flow-variables constant in a limited area at the free-stream boundary, according to the Rankine-Hugoniot relations after the shock and the initial free-stream conditions before the shock. The flow quantities at the inflow boundary result from the solutions of the compressible boundary layer equations and are held constant throughout the simulation. At the wall, a no-slip condition and vanishing normal velocities are assumed. Disturbances are introduced at a disturbance strip located between X1 and X2 in figure 1 with simulated blowing and suction. The disturbance function is ˆ ∗ sin(F t) ∗ cos(kβz) ∗ fr (ς). (8) fρv (ς, z, t) = a In our modal discretization in spanwise direction, k indicates the spanwise Fourier modes, with k = 0 meaning a two-dimensional disturbance. The disturbance frequency F determines the streamwise wave number αr via the dispersion relation of the disturbances. The spanwise wave number is β. Thus, ˆ is the disturbance amthe obliqueness angle ψ is given by tan ψ = (kβ)/αr . a plitude and fr (ς) the spatial disturbance function fr (ς) = ς 3 (3ς 2 − 7ς + 4),

0 ≤ ς ≤ 1,

fr (2 − ς) = −fρv (ς) , with ς=

2(x − x1 ) . x2 − x1

(9) (10) (11)

The wall temperature can be chosen to remain either constant or adiabatic. At the outflow boundary, second derivatives are neglected. To provide an undisturbed base flow at the outflow boundary, disturbances are damped artificially [9] at a disturbance strip, located between X3 and XN in figure 1.

128

A.Pagella and U. Rist

3 Computational Performance The code usually runs on the NEC SX-4 and NEC SX-5 of the High Performance Computing Center in Stuttgart (HLRS). The results presented within this paper represent problems of small size and were therefore computed on the smaller machine, the NEC SX-4. The grid size of the base flow is m × n = 301 × 1201 = 361502, the simulations with controlled disturbances consist of a grid with m × n = 301 × 650 = 195650 grid points and k = 10 harmonics in spanwise direction. m represents the number of grid-points in wall-normal direction, while n is the number of grid points in streamwise direction. Simulations at Ma = 6, σ = 12o with insulated wall typically perform with three processors at 2688 MFLOPS using 751 MByte. Code-vectorization is 98.8%.

4 Results In the following, results at Ma = 6 with a free stream temperature of T∞ = 78K and a shock angle of σ = 12o will be presented. Both simulations with adiabatic and cooled wall conditions (Tw = 300K = const.) have been carried out. 4.1 Base Flow Properties The upper picture in figure 2 gives the density field of a simulation with insulated wall and the free-stream conditions mentioned before. η = y · Re/Rx is a wall-normal similarity parameter. The thickening of the boundary layer due to the impinging shock wave can clearly be seen. It begins at Rx ≈ 1140. Also, the boundary layer is fairly thinner behind the interaction region compared to the boundary layer upstream. The lower picture in figure 2 shows the density distribution versus Rx , which was extracted at η = 50 = const. The location of typical interaction phenomena in the free stream, such as compression waves near separation and reattachment, the impinging shock wave and the expansion fan, which where already discussed in the introduction are marked accordingly. Caused by the pressure gradient of the impinging shock wave, the boundary layer separates, provided the shock is strong enough. Figure 3 gives the skin friction distribution of both the insulated wall-case and the case with Tw = 300K = const. For validation purposes, results of grid-refinement studies are also given, represented by the filled symbols. They show the gridindependency of our simulations. Simulations with both higher and longer integration domains indicated no influence of the boundaries (not shown here). The boundary layer of the cooled wall is thinner than the adiabatic boundary layer, therefore the skin friction coefficient in figure 3 of the incoming flow is larger compared to the adiabatic boundary layer. For the same shock angle

Direct numerical simulations of shock-boundary layer interaction

129

Fig. 2. Density field (upper picture) and density distribution for η = 50 = const. Ma = 6, σ = 12o , insulated wall

of σ = 12o , the length of the separation bubble, which can be identified by its negative skin friction, for the case with cooled wall is only ≈ 60% of the length for the case with insulated wall.

Fig. 3. Skin friction distribution of both the cases with insulated wall and constant wall temperature Tw = 300K

The wall-temperature distribution of the adiabatic case is given in the upper picture of figure 4. Caused by the influence of the shock, the wall temperature rises. Inside the separation bubble, the wall temperature nearly remains

130

A.Pagella and U. Rist

Fig. 4. Wall temperature distribution of the adiabatic boundary layer (upper picture) and wall pressure distributions

constant, similar to the typical plateau in the wall-pressure distribution. The total rise of the temperature over the interaction region does not exceed 13K. The wall pressure distribution is shown in the lower picture of figure 4. The larger separation bubble of the case with insulated wall is reflected in the wall pressure distribution as well. Because the shock is not very strong, the plateau is not pronounced in both the cases with adiabatic and cooled wall. The total rise of the wall pressure over the interaction region is slightly larger for the case with constant, cooled wall-temperature. 4.2 Small-Disturbances Development In this section, results of compressible linear stability theory computations will be shown, which are based on the scheme developed by [7]. Figure 5 shows such computations for Ma = 6 without impinging shock and cooled wall (upper picture) as well as insulated wall (lower picture). Given are amplification rates −αi = ∂ln(A(x)/A0 )/∂x, where A(x)/A0 is the amplitude ratio of any flow variable with respect to its initial amplitude. F is the disturbance frequency and Rx the square root of the local Reynolds number. The solid lines in the plots, labeled with “0 are the lines of neutral amplification, darker shadings indicate larger amplification rates. In the upper part of figure 5, the case

Direct numerical simulations of shock-boundary layer interaction

131

Fig. 5. Amplification rates αi with respect to the disturbance frequency F and the square root of the local Reynolds number Rx . Constant wall temperature Tw = 300K (upper picture) and insulated wall (lower picture). No shock

Fig. 6. Amplification rates αi with respect to the disturbance frequency F and the square root of the local Reynolds number Rx . Constant wall temperature Tw = 300K (upper picture) and insulated wall (lower picture). Shock angle σ = 12o

132

A.Pagella and U. Rist

with the constant wall temperature of Tw = 300K, only the second instability mode is present. Wall-cooling results in a complete stabilization of the first instability mode here. In the lower part of figure 5, the adiabatic case, both the first and second instability modes are amplified. However, the first and second modes have no distinct, separated location. They are affiliated to each other. Because of the fact, that the second mode is stronger amplified than the first mode in this configuration, the two modes still can be identified. Compared to the second mode in the case with Tw = 300K, the second mode of the adiabatic case has smaller amplification rates. This is an effect of wall-cooling in the case with Tw = 300K, too: although it stabilizes the first mode, higher modes are known to be destabilized with cooling. We now turn to the case with impinging shock wave as presented in the previous section. In figure 6 amplification rates with respect to the disturbance frequency F and the square root of the local Reynolds number Rx for two-dimensional linear disturbances are given. They are obtained by extracting local u and T profiles from the two-dimensional direct numerical simulation presented before, which are used as input data for the linear stability solver. The upper picture in figure 6 gives the amplification rates for the case with Tw = 300K, while in the lower picture amplification rates for the insulated wall-case are given. The insulated wall-case behaves similar to results obtained earlier for Ma = 4.8, which can be found in [38]. The first mode vanishes near shock-impingement, while the second mode is increased in its amplification rate and shifted to lower frequencies. New instabilities form at higher frequencies. In [38], the increase of the second-mode amplification rates is explained by an increase of the thickness of the local supersonic flow region. Diminishing viscosity caused by the separation of the boundary layer seems to play an important role, too. The cooled-wall case, which is given in the upper picture of figure 6 shows according behaviour for its second mode instability. However, compared to the adiabatic case the amplification rates remain larger over the whole parameter range given in the plot. The mode-identity is determined by the zeros of the pressure eigenfunction (cf. [7]). Figure 7 gives eigenfunctions and phase distributions at two different Reynolds numbers and various disturbance frequencies for the insulated case with shock. The solid lines at Rx = 900 correspond to a first-mode instability, because no zero is present in the eigenfunction, while the dashed line refers to a second mode (one zero). At Rx = 900 the boundary layer is not influenced by the shock-boundary layer interaction. Rx = 1300 lies well inside the separation bubble. The solid line in the corresponding pictures represents the eigenfunction and the phase distribution for a second mode (one zero). As briefly explained before, due to the influence of the shock, new instabilities at higher frequencies are formed near the interaction zone. Those refer to a third mode, which can be concluded from the two zeros (dashed lines in figure 7 at Rx = 1300).

Direct numerical simulations of shock-boundary layer interaction

133

Fig. 7. Pressure eigenfunctions and their corresponding phase distributions at Rx = 900 and Rx = 1300 for the adiabatic case. Shock angle σ = 12o

4.3 Larger-Disturbances Development We now discuss the non-linear behaviour of the same configurations used in earlier sections. For Ma = 4.5, results for fundamental, subharmonic and oblique disturbance scenarios were shown in [43]. It turned out, that independent from the disturbance scenario a strong increase of the so-called streak or vortex modes (0, k) could be observed downstream shock impingement. However, the amplitude was too small, so vortices could not be observed in the total flow. In the literature the occurrence of such vortices is typically explained with a G¨ ortler mechanism, triggered by the concave curvature near reattachment ([40, 41]). Figure 8 shows maximum temperature amplitudes of the direct numerical simulation in the oblique case scenario, which were obtained by a timewise Fourier analysis over one disturbance period. For comparison, results of the case without impinging shock wave are given as well, represented by the solid lines with the filled circle symbols. The wall-condition is adiabatic here. In the oblique disturbance scenario, we have a single threedimensional disturbance wave, of which the parameters are given in the plot. In figure 8 we can see, that downstream shock-impingement, a strong growth of all generated disturbance modes occurs. The highest amplitude is reached by (0, 2), which represents the first directly generated streak- or vortex mode.

134

A.Pagella and U. Rist

Fig. 8. Maximum temperature disturbance amplitudes. Lines with symbols represent case without shock. σ = 12o , insulated wall, oblique disturbance scenario. Spanwise wave number β = 10.4, disturbance frequency F = 1 · 10−4

Fig. 9. Maximum temperature disturbance amplitudes. σ = 12o , Tw = 300K, oblique disturbance scenario. Spanwise wave number β = 10.4, disturbance frequency F = 1 · 10−4

Compared to the case without shock, the amplitudes of the disturbance modes in the case with shock exceed the corresponding amplitudes in the case without shock by several orders of magnitude.

Direct numerical simulations of shock-boundary layer interaction

135

Fig. 10. Spanwise disturbance velocity component (grey colour map) and selected streamlines are given in the left picture, while in the right picture, the wall-normal disturbance velocity distributions of w and v  are shown. Single (0, 2)-mode, σ = 12o , Rx = 1600, insulated wall, oblique disturbance scenario. Spanwise wave number β = 10.4, disturbance frequency F = 1 · 10−4

Figure 9 shows maximum temperature amplitudes of the case with Tw = 300K. We observe a similar behaviour compared to the insulated case. However, maximum amplitudes reach smaller values in the case with Tw = 300K. In the case with Tw = 300K, the amplitude of (0, 2) is increased by a factor of ≈ 47 from the beginning of its rise, while in the insulated wall-case, this factor is ≈ 63. It has to be noted that the initial amplitude of the disturbances is slightly smaller in Tw = 300K, compared to the adiabatic case. Figure 10 shows the single vortex mode (0, 2) for the adiabatic case at Rx = 1600, a location where it already reaches considerable amplitudes. The left picture shows the spanwise disturbance velocity field and selected streamlines. The right picture gives the wall-normal distributions of the disturbance velocity components v  in wall-normal direction and w in spanwise direction. From the streamlines in the left picture of figure 10 we see four counterrotating vortices with their cores at y ≈ 1.5. If we add the largest vortex modes and the changes to the base-flow, which are represented by (0, 0), to the base-flow, we again have four counter-rotating vortices, which can be seen in figure 11. In the case without shock (figure 13), the single (0, 2)-mode has a different shape. Instead of one single vortex in wall-normal direction, two vortices are present. As expected, the maximum values of the disturbance velocity components v  and w are significantly smaller, compared to the case with shock.

136

A.Pagella and U. Rist

Fig. 11. Spanwise disturbance velocity component (grey colour map) and selected streamlines of the total flow are given. σ = 12o , Rx = 1600, insulated wall, oblique disturbance scenario. Spanwise wave number β = 10.4, disturbance frequency F = 1 · 10−4

Fig. 12. Spanwise disturbance velocity component (grey colour map) and selected streamlines of the total flow are given. No shock, Rx = 1600, insulated wall, oblique disturbance scenario. Spanwise wave number β = 10.4, disturbance frequency F = 1 · 10−4

In the total flow, there are no vortices in the case without shock, which can be seen in figure 12.

Direct numerical simulations of shock-boundary layer interaction

137

Fig. 13. Spanwise disturbance velocity component (grey colour map) and selected streamlines are given in the left picture, while in the right picture, the wall-normal disturbance velocity distributions of w and v  are shown. Single (0, 2)-mode, no shock, Rx = 1600, insulated wall, oblique disturbance scenario. Spanwise wave number β = 10.4, disturbance frequency F = 1 · 10−4

Figure 14 gives an explanation for this. Wall normal base-flow velocity components are given for both the cases without impinging shock wave (left picture) and σ = 12o . While in the case with shock, the magnitude of the v-component of the base flow is exceeded by the according disturbance velocity component around the vortex core, this is not the case without impinging shock wave. For both the upper and lower vortices in figure 13, the magnitude of the base flow is one order of magnitude higher than the disturbance amplitude v  . While the behaviour of the total flow in the case with σ = 12o is dominated by the (0, 2), in the case without shock it is still the base flow, which is dominating.

5 Conclusion Numerical simulations for a boundary layer at Ma = 6, T∞ = 78K with impinging shock wave (shock angle σ = 12o ) both for a constant walltemperature of Tw = 300K and insulated wall conditions have been presented. In the base flow, wall-cooling decreases the length of the separation bubble. In the underlying case, the length of the separation bubble could be decreased to 60% of the corresponding value with insulated wall. For small disturbance amplitudes, first mode instabilities have been completely stabilized with wall cooling. However, cooling caused a significant destabilization of the second

138

A.Pagella and U. Rist

Fig. 14. Wall-normal velocity distribution of the base flow at Rx = 1600 for the case without shock (left picture) and σ = 12o

mode. In the investigations with larger disturbance amplitudes, in both cases vortices could be identified in the oblique breakdown scenario. In a similar case without shock-boundary layer interaction, vortices are not present, because of a different shape and smaller magnitude of the (0, 2)-mode and the v-component of the base flow velocity counteracting the formation of vortices in the case without shock. As in the linear case, the maximum disturbance amplitudes in the oblique scenario reach higher values for the insulated case.

References 1. Ackeret J, Feldmann F, Rott N (1946) Untersuchungen an Verdichtungsst¨ oßen in schnell bewegten Gasen. TechReport, ETH Z¨ urich, Institut f¨ ur Aerodynamik 10 2. Liepmann HW (1946) J Aeronaut Sci 13:623–637 3. Gadd GE, Holder DW, Regan JD (1954) Proc Roy Soc A226:227–253 4. D´elery J, Marvin JG (1986) Shock-Wave Boundary Layer Interactions. AGARDograph 280 5. Hakkinen RJ, Greber J, Trilling L, Abarbanel SS (1959) The interaction of an oblique shock wave with a laminar boundary layer. TechReport, Institution NASA MEMO 2-18-59w 6. Katzer E (1989) J Fluid Mech 206:477–496 7. Mack LM (1969) Boundary layer stability theory. TechReport. Jet Propulsion Laboratory, Pasadena 900:277 8. Kloker MJ (1998) Appl Sci Res 59:353–377

Direct numerical simulations of shock-boundary layer interaction

139

9. Eißler W (1995) Numerische Untersuchungen zum laminar-turbulenten ¨ Str¨ omungsumschlag in Uberschallgrenzschichten. Phdthesis Universit¨ at Stuttgart 10. Lele SK (1992) J Comp Phys 103:16–42 11. Fezer A, Kloker M (1999) Transition Process in Mach 6.8 Boundary Layers at Varying Temperature Conditions Investigated by Spatial Direct Numerical Simulation. In: Nitsche W, Heinemann HJ, Hilbig R (eds) New Results in Numerical and Experimental Fluid Mechanics II. Vieweg. Notes on Numerical Fluid Mechanics 72:138–145 12. Schlichting H (1979) Boundary-Layer Theory. Publisher McGraw-Hill, seventh edition 13. Henckels H, Kreins AF, Maurer F (1993) Z Flugwiss Weltraumforsch 17(2):116– 124 14. Adams N (2000) J Fluid Mech 420:47–83 15. Eißler W, Bestek H (1996) Theoret Comput Fluid Dynamics 8:219–235 16. Eißler W, Bestek H (1996) Direct numerical simulation of transition in Mach 4.8 boundary layers at flight conditions. In: Rodi W, Bergeles G (eds) Engineering Turbulence Modelling and Experiments, Elsevier 3:611–620 17. Hein S, Bertolotti FP, Simen M, Hanifi A, Henningson D (1994) Linear nonlocal instability analysis – the linear NOLOT code – TechReport DLR-IB 223-94 A56 18. Kosinov AD, Maslov AA, Shevelkov SG (1990) J Fluid Mech 219:621–633 19. Pruett CD (1993) A comparison of PSE and DNS for high-speed boundary-layer flows. In Kral LD, Zang TA (eds) Transitional and Turbulent Compressible Flows. FED, ASME, New York 151:57–67 20. Stetson KF, Kimmel RL (1992) On hypersonic boundary-layer stability. AIAA Paper 92-073 21. Saric W, Reshotko E, Arnal D (1998) Hypersonic Laminar-Turbulent Transition. TechReport Inst AGARD AR-319 22. Malik MR (1989) AIAA J 27:1487–1493 23. Herbert T (1988) Ann Rev Fluid Mech 20:487–526 24. Bertolotti FP, Herbert T, Spalart PR (1992) J Fluid Mech 242:441–474 25. Thumm A (1991) Numerische Untersuchungen zum laminar-turbulenten Str¨ omungsumschlag in transsonischen Grenzschichtstr¨ omungen. Ph.d. thesis. Universit¨ at Stuttgart 26. Harris P (1993) Numerical investigation of transitional compressible plane wakes. Ph.d. thesis. University of Arizona 27. Anderson Jr. JD (1990) Modern compressible flow. Publisher McGraw-Hill 28. Lees L, Lin CC (1946) Investigation of the compressible laminar boundary layer. TechReport Inst NACA Tech Note 1115 29. Adams NA (1993) Numerische Simulation von Transitionsmechanismen in kompressiblen Grenzschichten. TechReport DLR-FB 93-29:28–29 30. Fezer A, Kloker M, Wagner S (2001) DNS of transition mechanisms on a sharp cone at Ma=6.8 and flight conditions. In: Proceedings Euromech Colloquium 31. Thompson KW (1987) J Comput Phys 68:1–24 32. Pagella A, Rist U, Wagner S (2001) Numerical investigations of small-amplitude disturbances in a laminar boundary layer with impinging shock waves. In: Wagner S, Rist U, Heinemann J, Hilbig R (eds) New Results in Numerical and Experimental Fluid Mechanics III. Springer, Notes on Numerical Fluid Mechanics 77:146–153

140

A.Pagella and U. Rist

33. Kloker M, Konzelmann U, Fasel H (1993) AIAA J 31:620–628 34. Pagella A (1999) Numerische Simulation der Stoß-GrenzschichtWechselwirkung an der ebenen Platte. Ph.d. thesis. Universit¨ at Stuttgart 35. Dolling DS (2001) AIAA J 39(8):1517–1531 36. D´elery JM (1999) Aeronaut J 1:19–34 37. Orszag SA (1971) Stud Appl Math L:293–327 38. Pagella A, Rist U, Wagner S (2002) Phys Fluids 14(7):2088–2101 39. El-Hady NM (1992) Phys Fluids A4:727–743 40. Aymer de la Chevalerie D, De Luca L, Cardone G (1997) Exp Thermal Fluid Sci 15:69–81 41. De Luca L, Cardone G (1995) AIAA J 33:2293–2298 42. El-Hady NM, Verma AK (1983) J Eng Appl Sci 2:213–238 43. (2002) CEAS TRA3 conference proceedings. Royal Aer Soc 33.1–33.13 44. Anderson Jr. JD (1995) Computational Fluid Dynamics. McGraw-Hill 45. Knight D, Yan H, Panaras AG, Zheltovodov A (2003) Progr Aerospace Sci 39:121–184 46. Hirsch C (1998) Numerical Computation of Internal and External Flows, Volume 1. John Wiley & Sons 47. Hirsch C (1990) Numerical Computation of Internal and External Flows, Volume 2. John Wiley & Sons 48. Babucke A (2002) Numerische Untersuchung von instatoin¨ aren Sto¨sGrenzschicht Interaktionen und Validierung des zweidimensionalen kompressiblen Navier-Stokes Verfahrens f¨ ur beliebige Geometrien. Diplomarbeit. Universit¨ at Stuttgart 49. Canuto C, Hussaini MY, Quarteroni A, Zang TA (1987) Spectral Methods in Fluid Dynamics. Springer 50. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical Recipes in Fortran. Cambridge University Press, second edition 51. Fedorov A, Shiplyuk A, Maslov A, Burov E, Malmuth N (2003) J Fluid Mech 479:99–124

Mathematical models of filtration combustion and their applications A.D. Rychkov1 and N.Yu. Shokina2 1 2

Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia [email protected] Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, Allmandring 30, 70550, Stuttgart, Germany [email protected]

Summary. The short review of the works in the field of mathematical modelling of filtration combustion, which are carried out in the research centers of Russia, is presented. The main attention is given to computational aspects. The examples of the numerical solution of three typical problems: filtration combustion of gas mixture in inert porous medium, combustion of fuel granules in hard fuel gas generator, and non-stationary combustion of hard fuel in automotive safety device (airbag) are presented.

1 Introduction The filtration combustion theory (FC) is the actively developing area of the combustion science. Nowadays the classification of FC processes is done, the basic laws of stationary and quasi-stationary FC wave propagation are obtained, and the wave propagation stability is investigated. The combustion limits are determined, the physical principles of structure regulation of a thermal wave, and the temperatures in the reaction zone are obtained. It is necessary to note, that the majority of works are devoted to stationary or quasi-stationary regimes of FC. In Institute of Chemical Physics of Russian Academy of Science (Chernogolovka) the filtration combustion was investigated under the conditions of turbulence wake filtration of heat carrier through reacting media [1]–[3]. The existence of stationary combustion wave and the presence of combustion zone overheat with respect to thermodynamic temperature of adiabatic combustion were theoretically shown and experimentally proved. The possibility of such “super-adiabatic” process build-up is of the fundamental importance, because it opens the way for combustion of low-calorie systems without additional expenses.

142

A.D. Rychkov and N.Yu. Shokina

The fundamental questions of the filtration gas combustion (FGC) were considered in the works of Novosibirsk researchers [4]–[7]. The main attention was also given to stationary regimes of combustion. The existence of two subsonic combustion regimes, namely, with low and high velocities of the combustion wave front movement, was discovered. The conditions of formation and stable existence of these regimes were investigated. The important role of the heat interaction between gas and porous medium, and the significant non-homogeneity of temperature and concentration fields in the reaction zone under the high-velocity combustion regime were shown. The amount of works devoted to the initiation of combustion process in porous media is much smaller. This fact is due to the difficulties of the analysis of non-stationary equations, which describe the ignition process in such media. Numerical modelling appears to be the most effective tool here. Its application allows taking into account many features, which are inaccessible for analytical approaches, and receive the information, which allows making the generalized conclusions on a process nature. In the works of Institute of Computational Technologies and Institute of Chemical Kinetics (Siberian Branch of Russian Academy of Science, Novosibirsk) the model of combustion in porous charge was suggested. The model distinguishes the temperatures of a gas phase and a fuel granule surface and takes into account the temperature distribution inside a granule. It has allowed obtaining the realistic picture of non-stationary combustion including the extinction modes (after the combustion of the part of a charge) under the change of the charge heat conductivity, ignition temperature, igniter weight, initial temperature, charge porosity and local speed of charge combustion. The investigations of non-stationary combustion of solid unitary fuels were also done by these Institutes during the last 10 years [8, 9]. The series of works on non-stationary combustion in gas phase inside the volumes, which are filled up with catalyst granules, was done using numerical modelling in Institute of Computational Mathematics and Mathematical Geophysics and Institute of Catalysis (Siberian Branch of Russian Academy of Science, Novosibirsk). The processes proceeding both on surfaces of a catalyst granule and inside a catalyst granule are taken into account. It has allowed obtaining the new knowledge of the processes with use of granular catalysts and to determine the perspective ways of the development of appropriate chemical technologies [10, 11]. The significant difficulties, connected with numerical realization of mathematical models of FC, should be noted. These difficulties are due to the stiffness of the relaxation equations, which describe physical-chemical transitions, the extremely small spatial length of the chemical reaction zone, and the bad conditionality of the gas dynamics equations and the Navier-Stokes equations for small Mach numbers. The first difficulty can be successfully overcame by the use of implicit A-stable difference schemes. As about the second difficulty, the placement of at least few grid nodes in the combustion zone is necessary, though it can be a serious problem for extensive flow domains.

Mathematical models of filtration combustion and their applications

143

Fig. 1. The grid structure

Thus, for the gas flames the width of reaction zone is estimated by the value δ = λ/(ρCp ub ), where λ, ρ, Cp , ub are the heat conductivity coefficient, the density, the specific heat and the velocity of the flame front propagation respectively. For instance, for the stoichiometric air-methanoic mixture ub ≈ 0.4 m/s, and the length of reaction zone is approximately equal to 2·10−4 m. Therefore, for FC modelling in the channel with the length of 0.5 m (it is the typical size of experimental facility) about 15000 nodes are required for the case of uniform grid. Here it is possible to use adaptive moving grids, which are condensed in the neighbourhood of the combustion wave front [12, 13]. However, as the theoretical investigations and the test calculations have shown, it is necessary for the spatial steps and the time steps of the grid do not differ too much even for implicit difference schemes in order to use adaptive moving grids successfully. This fact is due to the non-linearity of the systems of equations, which have to be solved. FC wave propagation velocity is small enough and the typical time length of the physical process can be tens of minutes. Therefore, the numerical modelling of the process requires a large number of time steps, and the effect of using these grids is reduced. The better approach is suggested in [10]. The combustion zone is allocated into the separate subdomain (Fig. 1), and the calculation is performed there using the fine spatial-temporal grid. The coarse grid with a big time step is used in the other part of a flow domain. The coordination of the numerical solution on the boundaries of fine and coarse grids is realized by linear interpolation. This approach was tested on the solution of several FC problems and showed good results. The adaptive projection-grid method, which was developed in [14, 15], seems to be the most perspective method for adaptive grid generation. The method is well validated theoretically, and it was successfully used for the solution of problems with moving boundaries. But there are no examples of use of this method for FC problems. The problem, connected with the bad conditionality of the gas dynamics equations and the Navier-Stokes equations for small Mach numbers, is not actual for the majority of FC problems. The reason is that a gas velocity is small enough in a porous medium, therefore, the pressure gradient can be neglected. In the case, when the pressure gradient should be taken into account,

144

A.D. Rychkov and N.Yu. Shokina

the preconditioned method is used [16] – [18], which is the generalization of the artificial compressibility method for incompressible flows suggested in [19]. The essence of the method is the following. The non-stationary system of equations is written down in the form of conservation laws: ∂Q ∂F ∂G ∂H = R, (1) + + + ∂z ∂y ∂x ∂t where Q = {ρ, ρu, ρv, ρE, ρY1 , . . . , ρYN }T is the vector of conservative variables with convective and diffusive fluxes F = Fc − Fv , G = Gc − Gv , H = Hc − Hv along the coordinate axis directions; ρ, u, v, w, p, Yi are the density, the projections of velocity vector on the coordinate axes, the pressure, the mass concentrations of mixture components and their molecular weights. ˆ = {ρ, u, v, T, Y , . . . , YN }T The derivative of the primitive variables Q 1 with respect to the pseudo-time τ and the preconditioning matrix Γ are added to the left-hand side of the system (1). The matrix Γ : Γ

ˆ ∂Q ∂F ∂G ∂H ∂Q = R, + + + + ∂z ∂y ∂x ∂t ∂τ

(2)

is chosen in such a way that after the linearization of the system (2) and writing it in so called “delta-form”: {Γ

∂ ∂ ∂ ∂ ˆ = −{ ∂Q + ∂F + ∂G + ∂H − R}, + C }ΔQ +B +A ∂z ∂y ∂x ∂t ∂z ∂y ∂x ∂τ n+1

ˆ ˆ =Q ΔQ

n

ˆ , −Q

(3)

the matrices Γ −1 A, Γ −1 B, Γ −1 C appear to be well-conditioned. The difference scheme for solving the system (3) is constructed in the following way (usually these are the schemes with LU-decomposition [20]). The iterations on the pseudo-time with the step Δτ are organized inside of each real time step Δt. After the convergence of these internal iterations, when ˆ = 0, the full approximation of the system (1) takes place on the next real ΔQ time level. When the filtering gas velocity is small and, therefore, the pressure gradient can be neglected, the systems of equations, which describe FC different models, are the equations of convection-diffusion type. Therefore, the use of central differences for the approximation of convection terms can lead to nonphysical oscillations of numerical solution. Thus, it can make the artificial focuses of ignition and distort significantly the real physical picture of the process. Seemingly, the most reasonable way is the use of one-side upstream differences for the approximation of convection terms. The example of such scheme is given in [8]. Let us consider the essence of the scheme using the following modelling equation as an example:

Mathematical models of filtration combustion and their applications

∂2T ∂T ∂T =λ 2, +u ∂x ∂x ∂t

145

(4)

where T is the temperature, u = u(x, t) is the velocity of medium flow, λ is the heat conductivity coefficient. Let us substitute the left-hand side of (4) by the equivalent expression: dT ∂T ∂T , = +u dt ∂x ∂t which is written down along the direction dx/dt = u(x, t). This direction is named as the characteristic of the equation (4). Therefore, the equation (4) is written down as follows: ∂2T dT =λ 2 ∂x dt along du dt = u(x, t). In order to solve it the following difference scheme is constructed (for the sake of simplicity the grid is assumed to be uniform with the step size h): T n+1 − T n = δ · λn+1 i τ



∂2T ∂x2

n+1

 + (1 −

δ)λn∗

i

∂2T ∂x2

n .

(5)



The pattern of this difference scheme is shown in Fig. 2a and Fig. 2b for the case u(x) < 0. The crosses denote the intersection points of the characteristics with the grid lines. In these points the values with the index “*” from the scheme (5) are calculated either by the upstream quadratic interpolation in space (Fig. 2a) or the quadratic interpolation in time with use of the additional time level (the scheme becomes the three-level one). The second partial derivatives are calculated with the help of central differences. The value δ, 0 ≤ δ ≤ 1, controls the approximation order of the scheme. It is easy to see that the scheme (5) approximates the equation (4) with the second order in space and time for δ = 0.5. Let us consider the following problems as the examples of the solution of typical FC problems: the problem on the metal combustion in porous inert medium, when the internal distribution of temperature along the granule

(a)

(b) Fig. 2. The pattern of difference scheme

146

A.D. Rychkov and N.Yu. Shokina

radius is taken into account; the problem on the modelling of processes in hard fuel gas generator of the low temperature gas; the problem on the nonstationary combustion of hard fuel in automotive safety device (airbag).

2 Modelling of FGC process 2.1 Mathematical modelling The flow domain is the cylindrical pipe (Fig. 3) with diameter 40 mm and length 165 mm. The pipe is filled by the spherical granules with diameter 6.5 mm. The granules are made of the claydite. The gas mixture of air and methyl hydride enters through the left boundary of the channel with the constant temperature T0 = 300K and the velocity u0 . The firing of the mixture is realized by the duty flame with the temperature 1500 K at the right end of the pipe. The system of equations, which describes the motion of one-dimensional reacting non-stationary two-phase flow at constant pressure p0 , taking into account the volume occupied by the immovable particles, has the following form: ∂ρ1 u1 ∂ρ1 = 0, (6) + ∂x ∂t ∂T1 ∂ ∂T1 ∂T1 ) + ρ2 γ(Ts − T1 ) + Y1 ρ1 W (T1 )Q1 , (7) (ε1 λg = + Cp ρ1 u1 ρ1 Cp ∂x ∂x ∂x ∂t ∂Y1 ∂ ∂Y1 ∂Y1 ) − Y1 ρ1 W (T1 ), (8) (ρ1 Dg = + ρ1 u 1 ρ1 ∂x ∂x ∂x ∂t p0 M , (9) ρg = R0 T 1 ρ1 = ε1 ρg , ε1 = 1 − ρ2 /ρp . Here u1 is the velocity of carrying gas; Y1 is the mass concentration of combustible component of the mixture; ρg , ρp , Cp are the physical densities of gas, hard particles and specific heat of gas correspondingly; ρ2 is the calculation density of hard phase, which is determined from the conditions of granule infill; λg , Dg are the coefficients of heat conductivity and diffusion; γ is the interphase coefficient of heat exchange; W (T1 ) = Kw exp(−E/(RT1 )) is the

Fig. 3. The scheme of flow domain

Mathematical models of filtration combustion and their applications

147

chemical reaction rate, which is described in the framework of formal kinetics model; Q1 is the heat effect of the chemical reaction. The indices 1 and 2 refer to gas and hard phase correspondingly; Ts is the temperature of particle surface. It is determined from the solution of one-dimensional stationary heat conduction equation for the sphere with the radius R taking into account the heat exchange of sphere with surrounding gas and the additional heat transfer along the granule “carcass”:   λp 1 ∂ ∂Tp 2 ∂Tp , (10) r = ∂r cp ρp r2 ∂r ∂t with initial and boundary conditions: Tp (0, r) = T0 , λp

 ∂Tp  = 0, ∂r r=0

 ∂Ts R ∂Tp  ), (ε2 λef f = α(T1 − Tp )|r=R + ∂x 3ε2 ∂r r=R

Ts = Tp |r=R ,

Nu λ

where α = 2R g is the heat exchange coefficient; λef f is the effective heat conduction of the granule “carcass”, the expression for it is taken from [21]: λef f = (10+0.1 Rep Pr) λg ; T0 is the temperature of surrounding environment. The coefficient of the heat exchange between gas and porous medium is determined as follows: γ=

6N u λg , d2p ρp

N u = 0.395Re0.64 P r0.33 , p

(11)

ud ρ

where dp is the granule diameter, Rep = μpg 1 is the Reynolds number. When the simplified model of heat exchange with the granules, where the temperature is uniformly distributed over the granule volume is used, the equation for obtaining Ts is written down as follows: ρp ε2 cp

∂Ts ∂ ∂Ts ) + ρ2 γ(T − Ts ). (εs λef f = ∂x ∂x ∂t

(12)

The boundary conditions for the systems (1)-(4) are given as follows: at x = 0 (the left boundary): u = U0 ,

T1 = Tinp ,

Y1 = Yinp ;

at x = xk (the right boundary) either the “soft” boundary conditions are given: ∂Y1 ∂T1 = 0, = ∂x ∂x or ∂Y1 = 0, T1 = Tig , ∂x

148

A.D. Rychkov and N.Yu. Shokina

Fig. 4. The position of the flame front Fig. 5. The position of the flame front for u0 = 0.165 m/s and its movement velocity for u0 = 0.33 m/s

at the presence of the duty flame here. The value of gas density is determined from the solution of state equation (4), the velocity distribution is obtained from the solution of continuity equation (1). The initial conditions at t = 0 are: u1 = 0,

T 1 = Ts = T 0 ,

Y1 = Y0 .

2.2 Some calculation results As it was mentioned above, the granules were made of the claydite. Its main thermophysical properties were cp = 1090 [J/(kg · K)], λp = 1 [W/(m · K)], ρp = 2400 [kg/m3 ]. The value of the multiplier before the exponent in the expression for reaction rate: Kw = 2 · 1011 [1/s], E/R = 27000 K, Q1 = 5.5 · 107 [J/kg]. The granule mass was equal to 0.11 kg, which corresponds to the parameter value ε2 = 0.374. The gas medium was the air-methanoic mixture (Yin = 0.05), which was close to the stoichiometric one. The gas entered the left boundary with three different velocities u0 = 0.165 m/s, 0.33 m/s and 0.66 m/s in order to investigate the influence of the velocity on the behaviour of filtration combustion process. The main goal of investigations was to understand the importance of taking into account the dynamics of the process of hear propagation inside each granule, which was described by the equation (5) (one-dimensional heat conductivity). The majority of researchers neglect this dynamics, thinking that the assumption about the uniform distribution of the temperature over the granule volume is sufficient enough (zero-dimensional model (12)). Figs. 4–6 show the change of the flame front coordinate Xf [m] and the velocity of its movement Vf [m/s] for the different values of the combustible gas mixture input velocity for the case of one-dimensional heat conductivity.

Mathematical models of filtration combustion and their applications

149

Fig. 6. The position of the flame front and its movement velocity for u0 = 0.66 m/s

After the position of the flame front has stabilized (at least in the “middle”), the duty flame was switched off. It can be seen that when the gas mixture input velocity is small (Fig. 4), then even after 20 minutes only the nonstationary combustion regime, which was stable at the average, was reached. The further increase of the input velocity allows obtaining the stationary combustion regimes. Figs. 7–8 show the dynamics of the flame front behaviour for zerodimensional model of heat conductivity (12) for two values of the gas mixture input velocity. It can be seen that the time of the exit to the stationary regime is noticeably larger in comparison with one-dimensional model. This difference is increased together with the increase of the gas mixture input velocity.

Fig. 7. u0 = 0.33 m/s (zero-dimensional Fig. 8. u0 = 0.66 m/s (zero-dimensional model) model)

For the value of velocity u0 = 0.165 m/s there is no noticeable difference in the behaviour of the values Xf and Vf in both models of heat conductivity.

150

A.D. Rychkov and N.Yu. Shokina

Fig. 9. The scheme of hard-fuel gas generator

3 Filtration combustion in hard-fuel gas generator 3.1 Mathematical model Let us consider the modelling of filtration combustion process in hard-fuel gas generator, when the granules of hard fuel are the source of gas, which moves in the porous medium. Here it is not possible to neglect the pressure gradient, and it is necessary to consider the full set of equations in conservation laws. Let us consider the process of ignition and combustion of the charge, which is made of the granular hard fuel with the open porosity ε, in gas generator. The scheme of gas generator is shown in Fig. 9. The gas generator is the channel with the fuel charge and the filter, which absorbs the hard fine dispersed particles, appeared due to the granule combustion. On the left face plane the ignition device is placed, the right face plane is Laval nozzle, through which the combustion products leave the gas generator. The heating of fuel granules occurs due to the input of high-temperature combustion products of the igniter with the temperature Tig and the constant mass flux G through the left boundary of the charge during the time 0 ≤ t ≤ tig . In addition to the gas these products contain the mass part αig of condensed particles. The sizes of these particles are small enough, therefore, it is assumed that their velocity and temperature are equal to the corresponding parameters of carrying gas. The main assumptions of the process model are following. 1. The modelling is done in the framework of continual model. All main components of the system are considered as two continuous interpenetrating mediums with their velocities and temperatures. There is the mutual exchange of mass, impulse and energy between the mediums. 2. The flow is non-stationary and one-dimensional. The composition of the gas phase is assumed to be homogeneous and described by the model of ideal gas with the constant adiabatic exponent. 3. The fuel granules consist of the combustible part and the binding material, which makes the porous carcass after the granule burn-out. The form of the granule is assumed to be spherical. The deviations of its real form from

Mathematical models of filtration combustion and their applications

151

a sphere are taking into account by the form coefficient in the resistance laws. In the process of device work the granules are assumed to be immovable, and their number in the volume unit (the calculation concentration) is always constant. The absorbing filter is the inert medium, which is also modelled by the spherical granules. 4. The ignition of the fuel granule occurs after its surface temperature reaches the given value Tb = const. After that the combustion starts with the discharge of the mass part of gas αg and the heat effect Q. The mass rate of granule combustion mb [kg/s] is assumed to be constant, the size of granule is not changed during the combustion process. 5. The sedimentation effect of the igniter hard fine dispersed particles on the granule surface is taken into account and increases the coefficient of the heat exchange. The change of hydrodynamic resistance of the porous medium during the sedimentation of the igniter condensed phase from the stream on the medium is neglected. The system of equations, which describes such flow, has the following form: ∂ρ1 u1 ∂ρ1 = Jg , + ∂x ∂t ∂ρig u1 ∂ρig = −Ased ρig , + ∂x ∂t dρ2 = −Jg , dt ∂p ∂ρ1 u21 ∂ρ1 u1 = −ρ2 βu1 u1 , + ε1 + ∂x ∂x ∂t λg ∂h1 ∂ ∂ρ1 h1 u1 ∂ρ1 h1 ) + ρ2 γ(Ts − T1 ) + Jg Cp Ts , (ε1 = + Cp ∂x ∂x ∂x ∂t

(13) (14) (15) (16) (17)

R 0 ρg T 1 , (18) M where h1 = Cp T1 , ρ1 = ε1 ρg , ρ2 = ε2 ρp . Jg and Ased are the income of the gas due to particle combustion and the sedimentation coefficient of igniter hard particles. The other notations are identical to the notations of Subsection 1.1. The index 1 refers to the carrying gas, 2 – to the hard phase. The granule surface temperature Ts is determined by solving the equation (5) for each granule with the following boundary conditions:  ∂Ts R ∂Tp  ) + ε2 Ased ρig Cig (T1 − Ts )]. [(ε2 λef f = α(T1 − Tp )|r=R + λp ∂x 3ε2 ∂r r=R (19) After the fuel granule ignition the equation (5) is substituted by the condition Ts = Tb . The value of the sedimentation coefficient is determined by the formula: p=

152

A.D. Rychkov and N.Yu. Shokina

π (20) Ased = αoc |u1 | d2p n2 , 4 where αoc is the empirical coefficient, n2 is the number of granules in the volume unit. The change of the mass of combustible part of burning granule and the mass income Jg are determined by the expressions: dm = −mb , dt

Jg = mb n2 ,

(21)

where 0 ≤ m ≤ αg m0 , m0 is the initial mass of the granule. The resistance coefficient β is calculated for εg ≤ 0.8 by Ergun formula: β = 150

ρg |u1 | ε2 μg . + 1.75 2 ε 1 d p φ 2 ρp (ε1 dp φ2 ) ρp

(22)

For εg > 0.8 the resistance coefficient is calculated as the resistance coefficient of the sphere with the mass, which is equivalent to the real non-spherical granule mass: ρg |u1 | −2.65 3 ε , (23) β = CD d p ϕ 2 ρp 1 4  24  1 + 0.15Re0.687 , if Rep ≤ 103 , |u1 |dp ρg ε1 p Re p . Rep = CD = 3 μg 0.44, if Rep > 10 , It is also assumed that the heat contacts, which take place between the particles before combustion, remain between their carcasses after the burn-out of combustible mass. Boundary and initial conditions are given as follows: at x=0 (left boundary): at t ≤ tig : at

t > tig :

(ρ1 + ρig )u1 = G, (ρ1 + ρig )u1 = 0,

T1 = Tig ; ∂T1 /∂x = 0.

at x = L (right boundary): ρ1 u1 = Gout (t), where the mass discharge through the Laval nozzle is calculated using the known formulas: ⎧   −(γ+1)/γ 1/2 −2/γ  ⎪ ⎪ p p 2γ p ⎪ √ , − pout F ⎪ pout ⎪ ⎨ min RT γ−1   γ/(γ−1) γ+1 p (24) Gout = if , ⎪ 2 pout < ⎪ γ+1 ⎪   √ ⎪ γ/(γ−1)  2(γ−1) p γ ⎪ p 2 ⎩ , ≥ γ+1 , if pout Fmin √RT γ+1 2 where Fmin , pout are the relative square of the minimal cross-section of nozzle and the pressure in surrounding medium correspondingly.

Mathematical models of filtration combustion and their applications

153

3.2 Some calculation results The calculations were done for the variations of the series of input parameters. The following set of parameters was accepted as the basic one: L = 9 cm, dpg = 0.06 cm, ε2 = 0.21, G = 0.307 g/cm2 , T0 = 300 K, cpg = 1.2 J/gK, ρpg =2 g/cm3 , ρpf = 2.7 g/cm3 , λpg = 1.2 Wt/mK, λpf = 0.87 Wt/mK, cpf = 0.65 J/gK, dpf = 0.04 cm, Q = 500 J/g, Pr = 0.62, μg = 1.7 · 10−5 P a · s, M = 28 g/gram-molecule, αg = 0.5, tig = 0.04 s, Tig = 2200 K, Tb = 690 K. The additional index g is related to the parameters of fuel granules, f – to the filter parameters. The accuracy of calculations was controlled by the decrease of space and time steps. The calculation results are shown in Figs. 10–15 in the form of dependency of the pressure before the nozzle, which is proportional to the mass discharge of gas from the device, on the time. For all variants, where the extinction didn’t occur, the process was essentially unstable with the pressure maximum. The increase of pressure was due to the progressing propagation of the ignition wave over the surface of pores. For all used variants of the input parameter set the combustion regime was convective (the contribution of heat conductivity along the carcass was small in (19)). The pressure maximum was reached when the flame front came to the end of the charge. After that the gas income was decreased with the time. When the values of input parameters G, λpg , ε2g , T0 , Tb , mb were changed from one variant to another, either the full burn-out of combustible mass of

Fig. 10. The influence of igniter mass. For G = 0.271 g/cm2 the amount of the charge, smaller than a quarter, was decomposed (extinction). In other cases there was the total extinction

154

A.D. Rychkov and N.Yu. Shokina

Fig. 11. The influence of particle heat conductivity. The extinction occurs for λ = 1.6 W/mK

Fig. 12. The influence of charge porosity. The extinction occurs for ε0 = 0.18

the charge or its extinction (the discontinuation of further propagation of the flame over granule surfaces) occurred. The reason of the extinction is in too intensive heat removal through the carcass. On the present stage of modelling the goal was not set to find the exact limits of full combustion. The calculations were performed with the big enough step of the change of input parameters. The conclusions about the presence or the absence of early extinction were made on the basis of behaviour of

Mathematical models of filtration combustion and their applications

155

Fig. 13. The influence of initial temperature. The extinction occurs for T0 = 275 K

Fig. 14. The influence of ignition temperature. The extinction occurs for Ts1 = 710 K

the curves of relative degree of the fuel granule burn-out. This value either was approaching asymptotically the unity (full burn-out) or didn’t exceed it during all the process (fractional burn-out) On the provided curves the extinction is appear as the absence of the further increase of pressure after the igniter combustion. The critical character

156

A.D. Rychkov and N.Yu. Shokina

Fig. 15. The influence of linear velocity of combustion r = mb /(ρpg πd2pg ). The extinction occurs for r = 0.4 mm/s

of the phenomenon can be seen most obviously in Fig. 14, where the values if input parameters Tb = 700, 710 K turned out to be near the bifurcation point of the solution. Thus, for the devices of considered type the possibility of critical phenomena is shown. These phenomena appear due to the local thermal nonhomogeneity, which was not taken into account earlier.

4 Modelling of non-stationary combustion of hard fuel in automotive safety device (airbag) Nowadays the numerical modelling is an effective tool for understanding in detail the complex physical-chemical processes in different technical devices. The airbag becomes a very popular automotive individual safety device. It consists of the combustion chamber, which is filled with the granules of a solid monofuel with comparatively low combustion temperature, connected to the special elastic shell, which is made of a gas-proof fabric. In initial state the shell is rolled up into the compact roll. After collision of an automobile with an obstacle the system of solid fuel ignition responds. The combustion products fill the shell during 50–80 milliseconds, transforming it into the elastic bag. Fig. 16 shows the scheme of the model of airbag combustion chamber with two slots as output nozzles. The symmetric configuration of the chamber allows performing numerical modelling in a quarter of its volume only. The particles of the fuel (granules) have a cylindrical form. The particles of the

Mathematical models of filtration combustion and their applications

157

Fig. 16. The scheme of airbag combustion chamber

igniter (booster) have a spherical form. The joint name for all particles is also used – the fuel elements. It is assumed that these fuel elements are distributed uniformly in the chamber and their location is not changed during the combustion process. Therefore, the fuel elements are assumed to be immovable, and only their sizes decrease. It is also assumed that the chemical reaction rates are large enough and the combustion processes come to an end near the surface of a fuel element. It allows describing these processes with the help of source terms in the equations of mass balance and energy balance. The escaping gas is the combustion product of the fuel elements and their heat determine the gas temperature. The following main assumptions are used in the mathematical formulation. 1. The flow is three-dimensional and non-stationary. 2. The continual model of two interpenetrating mediums is used. These mediums are the gas (combustion products) and burning hard material (fuel elements) as porous medium. 3. The work of the friction force and the pressure force are not taken into account in the energy equation at the description of the gas flow due to the small flow velocity. The heat transfer between the fuel elements caused by the heat conduction is also neglected. 4. The particles of the booster have a spherical form. A cylindrical form of the fuel granules is brought into accord with the equivalent diameter of a sphere. 5. The material of the fuel elements is homogeneous. The booster particles contain incombustible part (fine dispersed particles, for example KBO2 , K2 CO3 , which usually occupy 40-50% of the whole capacity). The incombustible part plays important role in the process of a fuel element ignition, therefore the mathematical model has to take it into account. 6. The combustion products consist of a mixture of the perfect gas with the constant adiabatic exponent and the incombustible small solid particles,

158

A.D. Rychkov and N.Yu. Shokina

which are the booster combustion product. These particles move with the same velocity as the gas stream. It is assumed that the size of the particles is small enough, therefore the carrying gas can be considered as two-phase equilibrium medium. 7. The initiation of the booster particle ignition occurs due to the heat exchange between the igniter combustion products, which enter the left boundary of the booster combustion chamber at the time moment t = 0. It is assumed that the composition of the igniter combustion products is identical to the composition of the carrying gas, but does not contain solid phase. 8. The temperature distribution inside a fuel element is described approximately by a spline function. 9. The combustion of a fuel element starts when its surface temperature reaches the given value Tv . The combustion rate depends on a local static presν sure and is calculated by the formula rb = rb0 (p/p0 ) with different constants rb0 and ν for the booster particles and the fuel particles. The spatial flow in the combustion chamber is described by the NavierStokes equations taking into account the exchange of mass, impulse and energy with fuel granules. The process of fuel granule combustion is described by the equations, which are similar to the equations (5), (15), (19), (21). The second-order upwind LU difference scheme with TVD-properties is used for solving the obtained system of equations. This scheme is close to the scheme [20]. Unfortunately, there are some iteration convergence problems at the calculation of the flow field in the places of the particle concentration sudden changes. Therefore, the following conditions on the discontinuity surface are used in the mass and energy conservation laws [22]:   un 2 = 0, [ρuτ ] = 0, Cpg T + [ρun ] = 0, 2 where un , uτ are the normal and tangential (to the discontinuity surface) components of the gas velocity vector. The flow parameters are defined from the conditions on the discontinuity surface:     1 1 1 1 1 −1 −1 (1 − ε) , j = ρun , − = , (1 − ε) − [p] = j 2 ρg− ρg+ ρ ρg+ ρg− with using of the one of the flow parameters. The values with the sign  + are taken in the two-phase flow domain, the values with the sign  − are taken in the pure gas domain. For example, at un ≥ 0 the pressure value p+ is specified behind the discontinuity surface, and the value p− is given at un < 0 respectively. This approach allows avoiding the appearance of the parasitic oscillations of the numerical solution. 4.1 Some calculation results The certain problem for the successful application of the numerical modelling to the airbag working processes is the lack of the reliable experimental data on

Mathematical models of filtration combustion and their applications

159

Fig. 17. The gas pressure on the left Fig. 18. The gas temperature on the boundary of the booster combustion left boundary of the booster combustion chamber chamber

the thermal and physical properties of the fuel elements and the sedimentation coefficient Ksd . These data influence significantly the ignition and combustion processes. Therefore, the numerical calculations are done for the four variants of these main initial data: Variant 1: Tv = 500 K, λp = 3.25 mWt s K , Ksd = 0. Variant 2: Tv = 500 K, λp = 13 mWt s K , Ksd = 0.25. Variant 3: Tv = 500 K, λp = 3.25 mWt s K , Ksd = 0.25. Variant 4: Tv = 650 K, λp = 3.25 mWt s K , Ksd = 0. Tv is the fuel ignition temperature. The booster mass is 0.002 kg, the fuel mass is 0.06 kg (ε = 0.218 and ε = 0.408, respectively). The parameter values are taken as follows: rb0 = 0.0001 m/s, ν = 1.0 for the booster; rb0 = 0.002 m/s, ν = 0.6 for the fuel. The booster ignition temperature is Tvb = 550 K. Fig. 17 shows the pressure curves, depending on the time, at the point on the symmetry axis on the left boundary of the booster combustion chamber (here and further the curve numbers correspond to the calculation variant numbers). The sharp pressure peaks point to the fast heating and subsequent ignition of the booster particles because of their small sizes. The further “stratification” of the curves is due to the back influence of the pressure change in the fuel granule combustion chamber after their ignition and burning. The back pressure waves are better seen on the gas temperature distribution curves (Fig. 18) at the same point. Figs. 19–21 show the parameter changes on the left boundary of the fuel combustion chamber. The character of the pressure change (Fig. 19) and the gas temperature change (Fig. 20) indicate the significant influence of the parameters λp , Tvp and Ksd on the gas-dynamic processes in the airbag. The fuel granule ignition dynamics is shown in Fig. 21. The influence of the additional heat transport is clearly seen. This heat transport is caused by the sedimentation of the hot fine dispersed particles, which appear due to the

160

A.D. Rychkov and N.Yu. Shokina

Fig. 19. The gas pressure on the left Fig. 20. The gas temperature on the left boundary of the fuel combustion cham- boundary of the fuel combustion chamber ber

Fig. 21. The temperature of fuel granule surface on the left boundary of the fuel combustion chamber

Fig. 22. The gas temperature at t = 1 Fig. 23. The gas temperature at t = 2 ms ms

booster combustion, on the granule surface. The increase of the heat conduc-

Mathematical models of filtration combustion and their applications

161

Fig. 24. The gas temperature at t = 3 Fig. 25. The gas temperature at t = 5 ms ms

tivity coefficient leads the significant “retardation” of the ignition and to the non-uniform combustion of the fuel granules in the combustion chamber. Figs. 22–25 show the dynamics of the processes in airbag for the basic data variant 3 at the time moments t = 1 ms, 2 ms, 3 ms and 5 ms. The spatial character of the flow at the initial stage of the device work is clear seen. The distribution rate of the ignition zone in the booster combustion chamber is significantly higher than in the fuel combustion chamber. The reason is in the difference between the sizes of these fuel elements. Further, as the ignition process and the combustion develop, the adjustment of the temperature field and the pressure field occurs. It allows the passage to the simpler modelling level, i.e. to the one-dimensional or even zero-dimensional level.

References 1. Aldushin AP, Seplyarskii BS, Shkadinskii KG (1980) Fiz Goreniya Vzryva 16(1):36–45 (in Russian) 2. Aldushin AP, Merzhanov AG (1988) Theory of filtration combustion: general ideas and state of investigations. In: Babkin VS (ed) Propagation of heat waves in heterogeneous media. Nauka, Novosibirsk (in Russian) 3. Grachev VV, Ivleva TP, Borovinskaya IL, Merzhanov AG (1996) Dokl Ross Akad Nauk 346 (5):626–629 (in Russian) 4. Babkin VS, Drobyshevich VI, Laevskii YuM, Potynyakov SI (1982) Dokl Akad Nauk USSR 265 (5):1157–1161 (in Russian) 5. Laevskii YuM, Babkin VS, Drobyshevich VI, Potynyakov SI (1984) Fiz Goreniya Vzryva 20 (6):3–13 (in Russian) 6. Laevskii YuM, Babkin VS (1988) Filtration combustion of gases. In: Babkin VS (ed) Propagation of heat waves in heterogeneous media. Nauka, Novosibirsk (in Russian) 7. Kakutkina NA, Babkin VS (1999) Fiz Goreniya Vzryva 35(1):60–66 (in Russian) 8. Zarko VE, Gusachenko LK, Rychkov AD (1996) Def Sci J 46(5):425–433

162

A.D. Rychkov and N.Yu. Shokina

9. Zarko VE, Gusachenko LK, Rychkov AD (1999) J Propuls Power 15(6):345–364 10. Drobyshevich VI (1997) Mathematical modelling of non-stationary hybrid combustion wave. In: Roy GD, Frolov SM, Givi P (eds) Advanced computation and analysis of combustion. ENAS Publishers, Moscow 11. Drobyshevich VI (1988) Propagation of thermal waves in heterogeneous media. Nauka, Novosibirsk (in Russian) 12. Degtyaryov LM, Ivanova TS (1993) Diff Urav 29(7):1179–1192 (in Russian) 13. Weber HJ, Mack A, Roth P (1994) Comb Flame 97:281–295 14. Sleptsov AG, Shokin YuI (1996) Adaptive projection-grid method for elliptic problems. Dokl Akad Nauk USSR 347:164–167 (in Russian) 15. Sleptsov AG, Shokin YuI (1997) J Comp Math Math Phys 37(5):572–586 (in Russian) 16. Withington JP, Shuen JS, Yang V (1991) AIAA Paper 91 – 0581 17. Shuen JS, Chen KH, Choi Y (1993) J Comput Phys 106:306–318 18. Edwards RJ, Roy ChJ (1998) AIAA Journal 36(2):185–192 19. Chorin AJ (1967) J Comp Phys 2:12–26 20. Yoon S, Jameson A (1988) AIAA J 26:1025–1026 21. Aerov ME, Todes OM, Narinskii DA (1979) Apparatuses with stationary granular layer. Khimiya, Leningrad (in Russian) 22. Sternin LE (1974) Basics of gas dynamics of two-phase flows in jets. Mashinostroenie, Moscow (in Russian) 23. Zeldovich YaB, Leipunskii OI, Librovich VB (1975) Theory of non-stationary powder combustion. Nauka, Moscow (in Russian) 24. Gusachenko LK, Zarko VE, Rychkov AD, Shokina NYu (2003) Fiz Goreniya Vzryva 39(6):97–103 (in Russian)

Computer simulation at VNIIEF I.D. Sofronov All-Russia Research Institute of Experimental Physics, Mir Ave. 37, 607190 Sarov, Russia [email protected]

Summary. The paper describes the organizational basis for computer simulation of complex physical processes at one of the most important Russian “weapon” computational institutions. The team passed a way of 50 years long from manual calculations to numerical simulation of various physical phenomena and processes in real, i.e. 2d and 3D, geometries. The paper presents the basic research fields and challenges, describes the generation and development steps of some new scientific approaches in simulation of solid mechanics, energy transfer and radiation transport, numerical studies of hydrodynamic instability etc. The paper addresses also the creation of high performance computers, organizational measures allowing the mathematicians to access the centralized computer kernel in a multiuser network.

The Computational Division of VNIIEF was founded on May 1, 1952 as a self-sustained branch of the Institute. Naturally, the mathematicians came to the institute much earlier; however mathematical teams were parts of physical structures. Currently the Computational Division is one of the most important mathematical institutes involved in the development of computational methods for a wide class of applications in computational physics, program implementation of those methods for various computer generations, multiprocessor system developments and calculation of multiple applications emerging from the design of various devices dominated by nuclear and thermonuclear weapons. The scope of activity did not allow the mathematicians to report their basic research efforts in open publications; the same reason prevented us from using the experience of our foreign colleagues in solving similar problems. In the context of the above the majority of resulting computational methods and computer codes represent original products. It should be noted that since early time the staff members of our division had to deal with the problems, where computational labor intensity was far beyond the hardware capabilities. Therefore it was not enough just simply to invent a method to treat a problem but invent a very cost-efficient method requiring minimum hardware resources; we had not simply write a program but

164

I.D. Sofronov

create a program that would be the most efficient both in terms of required arithmetic operations and the size of main memory and storage units. And this was true though the institute purchased the latest, highest performance domestic computers. During 50 years of work in these conditions the great team of mathematicians created more than 400 efficient programs complexes that represent our precious fund. The resulting computational methods, codes and computer operation technologies compensated considerably our trail behind the West in computer hardware. Our nuclear weapon designers had always computer models that were not actually inferior and in some cases were even superior than those from the West, which greatly contributed to achieving the parity in nuclear weapons. I do not mean that our mathematicians are more gifted than their US colleagues. The need for more efficient methods necessitated more scientists to do the work, a more careful choice of the computational strategy, more cost-efficient computations. This eventually resulted in that our mathematicians spent more time and efforts to develop and permanently improve their methods, to run the computations. In other words the final product required more human labor than in USA. Note that computations in our science sector are much more important than in other industries where the development of new designs is greatly assisted by laboratory modeling and field tests. Laboratory modeling allows to evaluate whether the designers made good solutions before the prototype is manufactured with minor time and resource costs. Manufacturing prototypes and their testing permit to consider the functioning of all device units and to make the required modifications, if necessary. It is quite different in nuclear industry. The physical environment resulting from nuclear device performance cannot be reproduced in laboratory. In other words, the computation of nuclear device performance is one of the basic methods to obtain new gadget performance data; currently it is actually the only way. Therefore the senior leaders focused on improving the computational base of the weapon institutes. Up-to-date equipment was provided, many talented mathematicians were recruited in nuclear industry. The nuclear industry school saw many eminent mathematicians both in Russia and USA. The Soviet mathematicians include M.V. Keldysh, A.N. Tikhonov, N.N. Bogolyubov, N.N. Yanenko, V.S. Vladimirov, G.I. Marchuk, A.A. Samarsky and many others. The same situation was observed in USA. Las Alamos and Livermore laboratories have the greatest computing centers equipped with highest performance US computers. The US nuclear industry school saw top-ranking mathematicians such as R. Courant, K. Friedricks, D. Neumann, S. Ulahm, R. Richtmyer, P, Lax, F. Harlow and others. It was said above that the complexity of challenges to be addressed was far beyond the capabilities of our computer hardware. This encouraged us to adopt the technology of calculating a single problem on several computers. As early as in the mid 60-s VNIIEF developed a four-computer system called

Computer simulation at VNIIEF

165

BESM-4.This system computed hundreds of 2D time-dependent problems in gas dynamics. After having acquired the experience in using the ensemble of BESM-4 computers, we did not hesitate to design BESM-6 system. In 70s, our basic computational capability was represented by the multicomputer called BESM-6 that was successfully used till 90-s. Its maximum configuration included nine computers; however a single problem was actually run on 3 or 4 computers. Connecting more computers did not greatly increase the computation speed. The experience allowed us to adopt efficiently the parallel computations on Elbrus 1-2K multiprocessor since early 80-s. This system successfully operated for many years demonstrating an example of highly reliable performance. The maximum configuration comprised five processors and was used by many programs to calculate 2D or 3D problems. The next achievement was the start-up of Elbrus-2 multiprocessors. And again our experience permitted to design several program complexes to be efficiently used the peak Elbrus-2 configuration composed of ten processors. In 90-s, after the collapse of the Ministries of Electronic and Radio Industries we had to assemble multiprocessor systems from foreign components. In 1990 the department was created in our division with the basic objective to design domestic multiprocessor. In 1992 i860 computer module was designed that later served the base for the 8-processor system. Finally, in late 90-s the number of processors increased by dozens of times eventually reaching even greater value. This reduced considerably the gap between VNIIEF and US nuclear laboratories in computational capability. At this point VNIIEF has the hardware sufficient for large-scale 3D computations. The scale of 3D computations is now limited by inadequate software rather than by low-performance hardware. We shall discuss this below. Finally, a few words about the multiuser network at VNIIEF. The efforts in this area were started in 60-s. In early 70-s a relatively powerful multiuser network kernel was designed together with a broad data communications network connecting the kernel to approximately thousand terminals (minicomputers, teletypes, monitors etc.). Since then the kernel permanently progresses and includes now several large-scale computers, servers, workstations, tape and disk drives and multiple work places where the institute staff members can access any computer. On-line mode resulted in several advantages as compared to off-line operation of the same computers. These include the following: -

Connection of multiple machine resources to calculate a large-scale problem; Higher robustness of the computer system; Considerable saving of external and main memory; Lower number of data I/O devices; Reduced maintenance personnel; Easier putting into operation of new computers etc.

166

I.D. Sofronov

The computational division has sufficiently wide software dominated by domestic products as was noted above. The existing software is oriented to the calculation of various applications in computational physics. 1. The so-called 1D problems are the most massive with the equations and hence the solutions depending on one time and one space coordinate. The equations to be solved are of the form: ∂V ∂ + B V = C, (1) V + A ∂X ∂ t where A, B, C and V are the coefficient matrices and vectors of free terms and unknown functions with dimension p. Equation (1) is supplemented in a correct way by initial and boundary conditions. They describe the applications of adiabatic gas dynamics, gas dynamics including heat conduction, detonation, strength, neutron transport in kinetic and diffusion approximation. The typical initial-boundary value problem setup is as follows: a part of space between r1 and r2 , is filled with gas layers with different physical properties represented by initial density, pressure, temperature and others. Equation (1) is expanded to include various equations and parameters describing the medium properties of the problem layers. These include equations of state, paths, neutron constants etc. 1D calculations use various numerical methods; gas dynamics is more frequently computed with explicit schemes. The heat conduction equation in high-temperature gas dynamics is calculated with implicit schemes. The neutron transport equation is usually solved with run-through schemes. A unique applications package called “1D Complex” was designed to calculate all 1D problems; it operated dozens of modules implementing various computational methods for a wide spectrum of problems [1]. The most massive problems include the following: 1. Adiabatic gas dynamics. This uses “Cross” scheme, Godunov scheme, implicit schemes for some classes of problems. 2. High-temperature gas dynamics including in particular nonlinear heat conductivity. 3. Gas dynamics including detonation and strength. 4. Multicomponent and multiphase gas dynamics. 5. Multitemperature mode gas dynamics. 6. Gas dynamics including magnetic field and neutron propagation, chemistry and neutron-nuclear reaction kinetics. All computational modules are handled by the unique service system, which considerably reduces the amount of efforts in developing new methods and programs. The “1D Complex” implements the unique systems of equations of state, paths and neutron constants etc. The development of equations of state, paths and neutron constants used the data reported in Russian and

Computer simulation at VNIIEF

167

foreign literature together with experimental data obtained by the institute. Fortran is the basic programming language. The “1D Complex” is successfully run on various machine types for many years; its programs are used to calculate tens of thousands of applications. This means that efforts spent for careful programming, achieving peak efficiency were reasonable. Generally, the applications to be calculated contain dozens of computational domains, hundreds or thousands of points and require hundreds or thousands timesteps to be solved. The computation costs reach range from tens of minutes to tens of hours on a machine. The “1D Complex” publications by various authors can be found in the review “Voprosy Atomnoi Nauki Tekhniki”, Ser. “Numerical methods and Programs for computational physics” since 1978 [2, 3]. 2. The major computer time is spent for 2D calculations that is for solving those problems where the system of equations and solution depend on two space and one time coordinates. Lagrangian and Lagrangian-Eulerian coordinates are the most popular though Eulerian and arbitrary moving coordinates are frequently used. The choice of coordinates is determined by the application type. For the problems with thin moving shells containing active materials where the processes carefully track the state at each point, it is more preferable to use Lagrangian coordinates though various computational effects should be overcome. If the problem does not contain thin shells and the reactions in materials slightly depend on the temperature and density at the point to be calculated Eulerian coordinates can be applied successfully demonstrating some favorable advantages. Most 2D methods use the so-called regular grid allowing to reorder the data as a 2D array. However we exploit successfully irregular grids for many years without fixed template for choosing the “neighbors” of the point to be computed. This will be discussed in more details later. Each point of the irregular grid uses its own original algorithm considering the position of points at a given time and in a specific space part to choose the point neighbors. The irregular approach in some cases has some advantages and is successfully used for various applications [4]. In fact, the irregular approach elements are encountered in many programs using regular grids either in the form of interfaces between two individual regular grids or as extended template in the case of highly non-orthogonal grids. 2D problems are solved with several program complexes [4]–[6] containing multiple special-purpose programs such as initial data calculation, computational modules implementing various computational methods, interface calculation routines, grid handling routines, computation management routines, data processing routines and others. We failed to create the unique service for all methods in 2D programs as is the case for 1D programs. Currently we have several finalized operational 2D complexes implementing various computational methods. The lack of the unique service complicates data transfer

168

I.D. Sofronov

from one program to another. However the convention exists regulating de facto standard for the problem data form to be transferred to another complex. BESM-6 has been the basic machine at our institute for a long time. Of course, this machine is not sufficient for 2D calculations; however no higher performance machines were produced till 80-s in the Soviet Union. For this reason, we had to write manually some 2D programs for BESM-6 in autocode. Manual programming allowed to write highly efficient programs exploiting the processor resources as intensively as possible (the loading reached 9095%) though the address space was extremely restricted. Clearly, no largescale problem containing tens of thousand of points can be allocated to the BESM-6 main memory. So we had to partition the problem into several subproblems to compute the solutions at each timestep for each subproblem. The researchers from our institute developed the appropriate partitioning theory and implemented this partitioning in all programs. Thus implied efforts allowed to accommodate parallel computations on several machines at the same time. The experience acquired on BESM-6 multicomputers allowed easy transition to the multiprocessor system Elbrus 1-2K and later this experience was used in developing the methods and programs for Elbrus-2 and homogeneous ES-1066 multicomputer system. Parallel mode of the above machines was very useful for the transition to multiprocessors. It was actually found that the resulting programs de facto implemented coarse-grain parallelism. Of course, the transition to multiprocessors required intensive finalization efforts; however we knew the ideas and principles. The basic difficulty was to increase considerably the halt-free capability of the programs. There is a great difference between computing on several processors and on hundreds of processors. If we keep in mind that the performance of each processor in a multiprocessor system exceeds that of the previous single-processor machines by a factor of tens or hundreds we can see that the transition required considerable algorithm modifications. Clearly, this dramatic increase in performance of the arithmetic processors resulted in changing computer and physical models of the structures to be calculated. It should be noted that parallelization efforts are not yet completed; we did not implement the fine-grain parallelism in a comprehensive manner where the solution of a single problem splits into hundreds and thousands branches. Here we have much to do. In late 90-s, we came to domestic multiprocessors. The amount of computations varied on BESM-6 from one year to another ranging from one to two thousand problems per year; this figure was much higher on Elbrus and ES and in recent years computed 2D problems count tens of thousands. For the problem size, at the time of BESM-6 and Elbrus an average problem required hundreds of hours while the most difficult applications required thousands of hours to be solved. The mean computation time considerably increased on multiprocessors and what is important is the emergence of very timeconsuming applications requiring 1016 -1018 arithmetic operations. The emer-

Computer simulation at VNIIEF

169

gence of these applications resulted from that the test ban led to more severe computations requirement, first, because of complicating computer models and, second, because of increasing the number of grid points. And third, the transition to 3D applications had also certain effect. Clearly, we should mention the fact that the mathematicians no longer focused on saving arithmetic operations. Now the mathematicians placed the emphasis to the development of halt-free programs, the deep parallelism, the development of new models etc. During approximately four decades the division developed 400 program complexes for the simulation of various components composing the products designed by the institute. The program development technology was created consisting of several steps: 1. Develop the Task Order and submit it to the science and engineering council of the division. 2. Prepare the development schedule estimating the amount of efforts. 3. Upon the work completion the software product is submitted to the competent commission including mathematicians, programmers, theoretical physicists to test the program and to verify the compliance with the Task Order. 4. The final step is to incorporate the software product in the production computation ensemble. The new product is not immediately favored by the customers; some time is required (months and even years) for the program wins the confidence of the users. The existing program complexes are our valuable software stock. Most programs were developed by the institute staff members – only 10 or 20 outside programs were adopted for the production computations. This is due to the fact that the applications we deal with are of interest for our institute and perhaps, for one or two other similar institutions. Because of our scope we could not be able to share our products with US colleagues. In rare case when we could get some US programs we rapidly understood that they were inferior in many parameters as compared to our products. In particular they require more computer time, greater memory size, declared halt-free capability is nothing else than promotional trick. Yet we were still interested by the US software products. Clearly, the US computer capabilities were always higher than ours. This parameter kept us always behind, which allowed us to consider the US experience in planning our development efforts and we sincerely appreciate this factor. Most domestic methods and programs use regular grid, i.e. the grid that can be converted to matrix-type rectangular grid through continuous nondegenerate transformations. Naturally, this restricted definition of the regular grid left good opportunity for the development of a nearly regular grids where the regularity is affected either in several points or on some lines etc. However completely irregular grids have been developed for a long time at our institute. The basic feature of irregular grids is that close points and even

170

I.D. Sofronov

metrics-neighboring cells do not have to be close on the grid. On irregular grids, the addresses of cells are indicated for each cell that neighbors the former. This address table may be either constant or changes as the solution process proceeds. I other words, if a left neighbor on a regular grid has the address unity lower and the right neighbor has the address a unity higher, metrics closeness or neighborhood on irregular grids do not imply closeness or neighborhood on the grid. Obviously, the amount of each cell data increases together with the amount of arithmetic work and programmer efforts. However these inconveniences are compensated by the freedom we obtain in handling the grid. For example, we can insert or remove one or more new points by using the neighbor data only; the domain to be computed can change its connectivity, the grid can be refined or made coarser, we can use a polygon with arbitrary number of sides as a cell, the cell convexity can be maintained etc. Clearly, this handling freedom can contribute to a greater halt-free capability of the method. 3. The development of 3D methods started many years ago. However early actual 3D calculations were run in 80-s on ES-1066 and Elbrus-2 computers. The development of an operational program for 1D, 2D or 3D calculations generally takes many years. The idea is not only to develop an algorithm and implement it in a program. This work is important and time-consuming representing only the first step in creating a production program. Then the program should incorporate perfect libraries describing material properties, to include the program parameters. Later the program should be run for sufficiently representative calculations to allow the customers to understand its features and capabilities. The program must be well tested and calibrated with field test data of existing devices; once this is made the program can gain the confidence of the customers and can be used to develop new products. The production program should demonstrate a number of technology features; for example, one cannot expect the program to be used only by its developers. A production program is basically run by technicians and laboratory assistants sometime with the participation of scientists and engineers. The developers interfere only in the event of important and highly complicated calculations. The emergence of computers with the performance an order of magnitude higher required the development of new computational methods and programs. A higher performance machine allows to take less care of arithmetic savings while focusing on halt-free computations; neglect memory savings while keeping in mind the accuracy achieved. By late 80-s, the production calculations were run with a few 3D programs that allowed to compute the propagation of electromagnetic radiation (light) in complex 3D geometries, to solve problems of adiabatic and hightemperature gas dynamics n complex 3D geometries including detonation and strength, to compute the propagation of neutrons, kinetics and energy release together with gas-dynamic flows [7]–[10].

Computer simulation at VNIIEF

171

The generation of programs was designed to calculate 3D problems while keeping in mind efficient solution of 2D applications. We believe the objectives were achieved. By the mid 90-s, the new programs computed many thousands of 2D applications and people dealing with 3D problems did not feel any restrictions neither in computer time cost nor in the computation process. This allows to suggest that the decision to create unique program complexes was reasonable. Moreover, in 90-s the situations frequently occurred where proved well 2D programs and methods were improved by the developers to calculate 2D and 3D problems. The fact is that though 3D methods and programs exist 2D applications still need to be calculated. This is primarily because the performance of our computers including ten-processor Elbrus-2 was insufficient to solve large-scale problems. This system was good to run 2D applications. It was also well suited for 2.5D applications, for example, where 3D processes are calculated in 2d geometry or a 3D process is computed in fixed geometry. Large-scale problems containing hundreds of thousands or millions points and considering the effect of a variety of physical processes required higher performance computers. These machines were developed in recent years: these are naturally multiprocessors. The transition to these computers takes a long time. More precisely, we easily converted to coarse-grain parallelism where a large-scale application with millions points requiring more than 1015 arithmetic operations is parallelized into hundreds branches. Note the computations may take several months. In fact, we are now facing the next step – fine-grain parallelism where the algorithm is parallelized into hundreds and thousands branches. This naturally requires he modification of algorithms and programs since the basic classes of applications using he existing programs demonstrate unacceptably high communications cost as compared to the time each processor needs to execute designated arithmetic work. The comprehensive implementation of fine-grain parallelism seems to require the modification of many algorithms because the high computer performance allows to use much more accurate physical models rather than because the arithmetic load of processors is found to be insufficient for up-to-date algorithms. Finally, it should be noted that our practices for many years were to solve the same problem with at least two different methods. This approach, first, prevents the errors of researchers in assured way and, second, the discrepancy between the results obtained can treated as he measure of achievable accuracy. However we have to run the calculations until quasi-convergence is achieved, i.e. the computations are repeated on a grid that is two or four times finer than the original version.

References 1. Sofronov ID (1978) Vopr Atom Nauki Tekhn. Ser Metodiki Progr Resh Zadach Mat Fiz 1:3–6

172

I.D. Sofronov

2. Voronov EG, Kaplunov MI, Podvalny VG et al.(1978) Ibid 2:3–12 3. Sofronov ID, Dmitriev NA et al. (1976) Computational method for 2D timedependent problems in gas dynamics in Lagrangian variables. Keldysh Institute, Moscow 4. Batalova MV, Bakhrakh SM et al.(1969) Sigma codes for gas-dynamic applications. In: Proceedings of the National Conference ”Numerical methods in viscous fluid mechanics”. Nauka, Novosibirsk 5. Glagoleva YuP, Zhogov BM, Kirianov YuF et al.(1972) Numerical methods in solid mechanics 3:18–55 6. Dementiev YuA, Machinin RF, Nagorny VI et al.(1978) Vopr Atom Nauki Tekhn. Ser Metodiki Progr Resh Zadach Mat Fiz 2:26–28 7. Rasskazova VV, Sofronov ID(1978) Ibid 1:76–87 8. Zmushko VV, Pletenev FA, Saraev VA et al.(1988). Ibid 1:22–27 9. Voronin BL., Skrypnik SI, Sofronov ID(1988) Ibid 3:3–8 10. Shokin YuI, Moskvichev VV (2002) Comp Techn 3:271–273

Mathematical modeling of optical communication lines with dispersion management M.P. Fedoruk Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia [email protected] Summary. The paper is an overview of the united efforts of Institute of Computational Technologies of SB RAS (Novosibirsk, Russia), Aston University (Birmingham, United Kingdom) and Institute of Automation and Electrometry of SB RAS (Novosibirsk, Russia) in the field of mathematical modeling of dispersion-managed (DM) solitons in transmission fiber lines. The most widely used mathematical models of dispersion-managed solitons as well as corresponding numerical techniques are discussed. Some results of numerical simulation for a number of important practical dispersion maps are presented.

1 Introduction The realization of soliton-based optical data transmission has clearly demonstrated how the results of the fundamental soliton theory (see e.g [1]-[4]) can be successfully exploited in very important practical applications. The recently suggested dispersion management technique allows the increase of bit-rate per channel and the suppression of interchannel interaction in WDM (wavelength division multiplexing) systems in comparison with the traditional soliton transmission [5]. The DM soliton is a novel type of optical information carrier with many attractive properties (see e.g. [6]-[14] and references therein) combining features of the traditional fundamental soliton and the dispersion-managed non-return-to-zero transmission. Numerical simulations and experiments have revealed the following main features of the DM soliton (see e.g. [14], [15]): the width and chirp (characteristic of the phase of the pulse) experience large oscillations during the compensation period leading to “breathinglike” soliton dynamics; the shape of the forming asymptotic pulse is not always a sech shape as for the NLSE (nonlinear Schr¨ odinger equation) soliton, but varies with the increase of the strength of the map from a sech shape to a Gaussian shape

174

M.P. Fedoruk

and to a flatter waveform. The pulse shape varies along the compensation section from a monotonically decaying profile to a distribution with oscillatory tails; the time-bandwidth product varies with the increase of the map strength (that is a measure of the dispersive broadening proportional to the difference of the local dispersions multiplied by the fibre lengths and inversely proportional to the square of the pulse width) from 0.32, corresponding to the sech-shaped NLSE soliton, to 0.44, corresponding to the Gaussian pulse, and increases further with the increase of the map strength; the energy of the stable breathing pulse is well above that of the NLSE soliton with the same pulse width and of the corresponding average dispersion; DM soliton can propagate at the zero path-average dispersion and even in the normal dispersion region; the central part of DM pulse is self-similar, but the far-field oscillating (and exponentially decaying) tails are not. The paper an overview of the united efforts of Institute of Computational Technologies of SB RAS (Novosibirsk, Russia), Aston University (Birmingham, United Kingdom) and Institute of Automation and Electrometry of SB RAS (Novosibirsk, Russia) in the field of mathematical modeling of dispersionmanaged (DM) solitons in transmission fiber lines ([14]-[23]). In the first section NLSE and the basic model for studying the properties of DM solitons in the fiber links are described. The numerical algorithm for solving this equation and some DM soliton solutions are presented. In the second section the path-averaged model of DM soliton is described, the numerical algorithm is presented and the results of numerical study of the properties of a path-averaged optical soliton in double-periodic DM systems are shown. In the third section the TM-model of the ordinary differential equations (ODE) for the describing the DM solitons dynamics is discussed. It is shown that, instead of solving a partial differential equation, two ordinary differential equations can be solved to approximate signal transmission with a good accuracy. The conclusions are presented in the last section.

2 Basic mathematical model. Nonlinear Schr¨ odinger equation The optical pulse propagation in a cascaded transmission system with the varying dispersion is governed by the following equation [24]: i

2 π n2 λ2 D(z) ∂ 2 E ∂E |E|2 E = + + 0 λ0 Aef f 4 π cl ∂t2 ∂z

Mathematical modelling of optical communication lines

= i [−γ(z) + rk

N 

δ(z − zk )]E = i G(z)E.

175

(1)

k=1

Here z is the propagation distance in [km], t is the retarded time in [ps], |E|2 = P is the optical power in [W], D(z) is the group velocity dispersion measured in [ps/(nm · km]. The dispersion management is assumed to be periodic with the period L : D(z + L) = D(z); zk are the amplifier locations. A periodic amplification with the period Za is considered. If γ = γk is constant between two consecutive amplifiers, then rk = [exp(γk Za ) − 1] is an amplification coefficient after the fiber span between the k-th and (k − 1)-th amplifier. n2 is the nonlinear refractive index; Aef f is the effective fiber area, γ = 0.05 ln 10 α (with α in [dB/km] is fiber loss of the corresponding fiber, cl is the speed of light, λ0 = 1.55 [μm] is the carrier wavelength. A general case when L and Za are rational commensurable is considered, namely, n Za = m L = Z0 with integer n and m. It is customary to make the following transformation from the original !z optical field E(z, t) to A(z, t) = E(z, t) exp[ 0 G(z  )dz  ]. Therefore, the evolution of the scaled envelope A is given by the NLSE with periodic coefficients: iAz + d(z)Att +  c(z)|A|2 A = 0, where 2 π n2 exp[2  c(z) = λ0 Aef f



z

G(z  )dz  ], d(z) =

0

(2) λ20 D(z) . 4 π cl

(3)

2.1 Numerical algorithm We consider the split-step Fourier method for solving NLSE (1) following [24]. The equation (1) is written down in the form: ∂E ˆ +N ˆ )E, = (D ∂z

(4)

where D is the differential operator, which accounts for the dispersion and ˆ is the nonlinear operator, which governs the the loss in a linear medium. N effect of fiber nonlinearities on pulse propagation. These operators are given by the formulas: 2 2 ˆ = i λ0 D(z) ∂ E − γ(z), (5) D 4 π cl ∂t2 ˆ = i 2 π n2 |E|2 . (6) N λ0 Aef f In general, the dispersion and the nonlinearity act together along the length of the fiber. The split-step Fourier method obtains an approximate

176

M.P. Fedoruk

solution by assuming that the dispersive and nonlinear effects can be considered to act independently in the propagation of the optical field over a small distance Δz. More specifically, the propagation from z to z + Δz is carried out in two ˆ = 0 in (4). In the steps. In the first step the nonlinearity acts alone, and D ˆ second step the dispersion acts alone, and N = 0 in (4). It can be written mathematically as follows:     ˆ exp Δz D ˆ E(z, t) E(z + Δz, t) = exp Δz D (7)   ˆ can be evaluated in the Fourier domain The exponential operator exp Δz D using the formula:     ˆ ˆ E(z, t) = F −1 exp Δz D(iω) FT E(z, t), exp Δz D T ˆ where FT denotes the Fourier-transform operation, D(iω) is obtained from ∂ by iω, and ω is the frequency in (5) by replacing the differential operator ∂t the Fourier domain. Such method is of the second order with respect to a step size Δz [24]. In calculations we use the symmetric form of the split-step Fourier method given by formula:       Δz ˆ ˆ E(z, t). ˆ exp Δz N (8) N exp Δz D E(z + Δz, t) = exp 2 2 The most important advantage of using the symmetrized form of (8) is that the leading error term is of third order with respect to the step size Δz. 2.2 Examples of computation In this section the numerical simulation results of studying the properties of the dispersion-managed optical solitons for the system with so-called shortscale dispersion management and the system with the backward Raman amplification on the basis of (1). In the case of the short-scale dispersion management [16]-[18] we choose the amplifier distance Za and the two-step dispersion map with the dispersion compensation period L = Za /J [km]. The dispersion is d + d, if Jk < z < k+a J , k = 0, 1, 2, ..., J − 1. d(z) = k+1 k+a da < z < + d, if J , J a−1 The parameter a ∈ (0, 1) describes a position of the step. The following parameters are used in the simulations: the dispersion in the two-step map (a = 12 ) is equal to ±16 + 0.1 ps/(nm · km (see Fig. 1),

16 8 0 -8

-16 0

177

12

PEAK POWER (mW)

DISPERSION [ps/(nmkm)]

Mathematical modelling of optical communication lines

10

20

30

8

4

0 0

40

10

20

30

40

10

20

30

40

PULSE WIDTH (ps)

8

CHIRP

4 0 -4 -8 0

10

20

30

40

PROPAGATION DISTANCE (km)

10

8

6 0

PROPAGATION DISTANCE (km)

Fig. 1. The evolution of the soliton peak power (right top picture), chirp (left bottom picture) and full-width at half maximum (right bottom picture) along one section for the transmission system with the short-scale dispersion map (left top picture)

the nonlinear coefficient σ = 2 π n2 /(λ0 Aef f ) = 2.43 W−1 km−1 , the fiber loss α = 0.21 dB/km. The amplification distance is equal to 40 km, the dispersion compensation length is equal to 4 km. Fig. 1 shows that the evolution of soliton parameters over one period is asymmetric because of the loss. The rapid variations of the pulse width, peak power and chirp are accompanied by the exponential decay of the power. Nevertheless, the numerical simulations have revealed that a true periodic solution exists, which reproduces itself at the end of the compensation cell (in this case - at the end of the amplification period). The observed DM soliton is very stable and propagates without a radiation as it is seen in Fig. 2 (the system parameters are the same as for Fig. 1). Fig. 2 illustrates the chirp of the DM soliton versus the width. The left and right figures show this dependence for the first and for the 140-th sections, correspondingly. Now, we consider the properties of the dispersion managed solitons in the transmission system with the distributed backward Raman amplification [19]. In case of the backward pumping configuration G(z) in (1) is given by the formula [19]: G(z) = −γ(z) + g0 exp [−2γp (Za − z)] . (9)

M.P. Fedoruk

8

8

6

6

4

4

2

2

Chirp

Chirp

178

0

0

-2

-2

-4

-4

-6

-6

-8

7

8

9

10

Pulse width (ps)

-8

11

7

8

9

10

Pulse width (ps)

11

Fig. 2. The chirp versus the width of the DM soliton for the first (left picture) and for the 140-th (right picture) sections

Here g0 is related to the injected pump power which is chosen to recover the pulse power at the end of each section:  g0 =

Za

G(z) = 0, 0

and the fiber loss at the pump wavelength γp = 0.05Za ln 10 αp . The transmission line is composed of the standard fibre (SMF) followed by the dispersion compensating fibre (DCF). The transmission line length is 119.425 km (102 km of SMF and 17.425 km of DCF). The fiber parameters used in the calculation are summarized in Table 1. Table 1. Parameters of the system Fibre parameters Length (amplifier spacing) in [km] Dispersion D at 1553 nm in [ps/(nm · km)] Dispersion slope dD/dλ in [ps/(nm2 · km)] Loss in [dB/km] Nonlinear refractive index n2 in [10−20 · m2 /W] Effective fiber area Aef f in [μm2 ]

SMF 102 +16.4 +0.06 0.21 2.67 80

DCF 17.425 - 96 -0.18 0.5 2.67 26

Mathematical modelling of optical communication lines

179

Fig. 3. The evolution of the soliton peak power (left bottom picture), chirp (right top picture) and full-width at half maximum (right bottom picture) along one section for the transmission system with Raman amplification and two-step dispersion map (left top picture)

The form of the dispersion map and the evolution of the soliton peak power, width and chirp are shown in Fig. 3. Fig. 4 shows the slow dynamics of the DM soliton. The pulse is shown stroboscopically (logarithmic scale) at the ends of the amplification sections. One can see that DM soliton is stable and travels along the system without any radiation.

3 Path-average model of DM soliton If a characteristic nonlinear length of the pulse is larger than the period of the dispersion variations, then an averaging approach can be applied in order to simplify the basic equation. The resulting path-averaged Gabitov-Turitsyn equation [8], [20], [25]-[27], presented in the spectral domain, has the following form: iΨz − dω 2 Ψ + G(Ψ, ω) = 0, (10) where Ψ (ωk ) is a Fourier transformation of an averaged variable, d = !1 d(z)dz is an average dispersion. G(Ψ, ω) is a nonlinear integral operator: 0

180

M.P. Fedoruk

Fig. 4. The evolution of the DM soliton over many periods plotted on a logarithmic scale

 G(Ψ, ω) = 

Tω123 Ψ ∗ (ω1 )Ψ (ω2 )Ψ (ω3 )×δ(ω +ω1 −ω2 −ω3 )dω1 dω2 dω3 , (11)

with a matrix element Tω123 , which is a complex function of a specific combination ΔΩ = ω 2 + ω12 − ω22 − ω32 : 1 c(z)eiR(z)ΔΩ dz.

Tω123 = T (ΔΩ) =

(12)

0

A function R(z) is defined from the equation Rz = d(z) − d. Equation (10) has a standard form, which is typical for models describing a four-wave interaction with a quadratic dispersion law. Therefore, the specific properties of the model are given by a dependence of the matrix element T on ΔΩ. In the case of the so-called weak dispersion management (“small” overall effect of a variation of R(z)) eq. (10) can be reduced to NLSE with the constant coefficients [26]. In this section we construct the numerically exact periodic solutions (DM solitons) of the path-average model (10) for a range of practical DM lines. Similar to the well-studied NLSE (T = 1), we seek a soliton solution of (10) in the following form: Ψ (ω, z) = ψ(ω) exp(iλ2 z). The equation for a DM soliton shape ψ(ω) takes the form:

Mathematical modelling of optical communication lines

(λ2 + dω 2 )ψ = G(ψ, ω).

181

(13)

Note that for a real matrix element T the solution of this equation with a real function ψ(ω) can be found. In general case the matrix element T is a real function only for a partial sub-classes of transmission systems and corresponding coefficients d(z) and c(z). The well-known and studied case of the real matrix element is a lossless two-step DM system with the matrix element T = sin(sΔΩ)/(sΔΩ), where s is a map strength. Another important example of the real matrix element is a long-scale dispersion-managed line [21]. Note, however, that for a short-scale dispersion management [17, 28] the corresponding matrix element is complex. We use terms “long scale” or “short scale” if the length of the dispersion management is greater or smaller than the distance between the amplifiers. 3.1 Numerical algorithm An effective numerical method to find a soliton solution for (13) was suggested by Petviashvili in [29], [30] and was applied to the DM soliton problems in [31]-[33]. The iterations, which are used in the Petviashvili method, request a computation of the integral operator G(ψ, ω). After a single integration, using the delta-function, the operator G(ψ, ω) includes a double integration. Therefore, a computation of G(ψ, ω) requests N 3 operations in general, where N is a number of points. Here we apply a novel effective numerical algorithm to solve this problem. The idea of our method is based on an approximation of the matrix element T (ΔΩ) by an appropriate set of functions. This approximation allows us to apply a fast computation of convolutions and to reduce a number of operations to M N log2 (N ), where M depends on the approximation of T (ΔΩ). If the matrix element T is a fast oscillating function, then T can be approximated by a Fourier series. In this case a computation can be reduced to a computation of the convolutions, and M is equal to a number of terms in Fourier series. The alternative approximation allowing an application of the fast convolution algorithm is a trigonometric approximation: T (ΔΩ) =

M 

Tn exp(iRn ΔΩ),

(14)

n=0

where Rn are some coefficients. The first way to obtain the trigonometric series is to find the Fourier series of the function T (ΔΩ) on the interval of the integration. The second way is to make use of the integral form of T (ΔΩ). Using the quadrature formula for this integral we obtain the approximation: T (ΔΩ) =

M  m=0

Wm c(zm )eiR(zm )ΔΩ ,

(15)

182

M.P. Fedoruk

where Wm are the weight coefficients of the quadrature formula. Each term of the Fourier series can be factored with respect to ωk , therefore the corresponding integral can be computed with the help of two sequential convolutions. The matrix element T (ΔΩ) has the obvious symmetries, therefore the integral operator is presented in the symmetric form. Integrating over ω1 and ¯ 1 + ω, ω3 = ω ¯ 2 + ω we obtain the introducing new symmetric variables ω2 = ω integral:  ¯1 + ω ¯ 2 )ψ(ω + ω ¯ 1 )ψ(ω + ω ¯ 2 )d¯ ω1 d¯ ω2 , (16) Tn eiRn ΔΩ ψ ∗ (ω + ω where ΔΩ = ω 2 + (ω + ω ¯1 + ω ¯ 2 )2 − (ω + ω ¯ 1 )2 − (ω + ω ¯ 2 )2 . Factoring the matrix element we obtain the following symmetric form, which is suitable for the fast computation:  " #∗ 2 2 ¯1 + ω ¯2) · e−iRn (ω+¯ω1 +¯ω2 ) ψ(ω + ω Tn eiRn ω " #" # 2 2 · e−iRn (ω+¯ω1 ) ψ(ω + ω ¯ 1 ) e−iRn (ω+¯ω2 ) ψ(ω + ω ¯ 2 ) d¯ ω1 d¯ ω2 .

(17)

In order to calculate this integral either the fast algorithm for a convolution and a correlation [34] or the inversion into a time domain can be used. The last approach is considered in [33] for the partial lossless two-step model. The general case is considered in [22]. 3.2 Examples of computation We apply our algorithm to find DM soliton solution for a general case with different periods of power and dispersion oscillations. We consider different matrix elements corresponding to practical fiber optical lines [23]. The simplest and important example of a matrix element is T (ΔΩ) = sin(sΔΩ/2)/(sΔΩ/2), where s is the dispersion map strength. This matrix element arises for the lossless equation with the two-step dispersion map. We consider the fiber lines with different (but rational commensurable) periods of power and dispersion oscillations. Namely, we analyze two opposite limits: the short-scale dispersion management and long-scale two-step dispersion map. Firstly, we consider the case of the long-scale two-step dispersion management with L ≥ Za [21]. Let the distance between optical amplifiers be Za [km] and L = 2K · Za [km], where K = 1, 2, .... The dispersion is d + d, if 0 < z < K, d(z) = −d + d, if K < z < 2K. The mean-free function R defined above can be found using the formula:

Mathematical modelling of optical communication lines

183

Fig. 5. The power and the spectral power of the DM soliton for K = 5 (ten amplifiers for the dispersion period) with d = 0.005

 R(z) =

, if 0 < z < K, d z−K/2 2 z−3K/2 , if K < z < 2K. −d 2

c(z) = c0 exp(−2γz) if 0 < z < 1. The matrix element T (ΔΩ) of such system is $ cos[X] 2 X G + 1 1 sin[X K] , + T (X) = B(G) lnG G − 1 (1 + [2X/lnG]2 ) sin[X] K

(18)

G−1 ΔΩ d ΔΩ Za d . , B(G) = = G ln G 4K 2L Here the gain is G = exp[2 γ Za ] (γ is a fiber loss). For the numerical computation we choose the variation of the dispersion d = 0.5. Fig. 5 shows the power and the spectral power of the true DM solitons obtained as a solution of (13) for the matrix element (18) and K = 5 (ten amplifiers for the dispersion period). Secondly, we consider a short-scale dispersion management with L  Za [18, 28]. We choose the amplifier distance Za , the two-step dispersion map with the dispersion compensation period L = Za /J [km]. The mean-free function R defined above can be found using the formula:   a ,  if Jk < z < k+a d z − Jk − 2J J , R(z) = k+1 k+a a+1 k da < z < , if − z − J . J 2J J a−1 X=

184

M.P. Fedoruk

Fig. 6. The power and the spectral power of the DM soliton for J = 5

The matrix element Tω123 has a self-similar structure: Tω123 = B(G) · F (a, Z, Y ), 

iY F (a, Z, Y ) = 1 + Z − iY



e(1−a)Z+iaY − 1 Z · 1− Z e − 1 (1 − a)Z + iaY



e−iaY /2 .

(19)

Here the amplitude B is a function of G = exp(2γZa ) only and is independent of J. The shape F (a, Z, Y ) is a function of the parameter a and the specific combinations of Z = ln G/J and Y = dΔΩ/J. For the numerical computation we choose the following parameters: a = 0.5, d = 0.5, and d = 0.005. Fig. 6 shows the power and spectral power of the true DM solitons obtained as a solution of (13) for the matrix element (19) and J = 5.

4 TM-model of ordinary differential equations Let us consider the approach, which is based on the assumption about the local structure of the pulse. Let us consider the following approximation of the pulse [35]: t M (z) 2 Q[z, x] , t ], x = exp[i A(z, t) = N  T (z) T (z) T (z)

(20)

Mathematical modelling of optical communication lines

with

∂ arg Q = 0. ∂x

185

(21)

In other words, we assume: in the leading order the phase factor M t2 /T in the transformation (20) describes the pulse chirp in the energy-containing central part; the RMS width of the transformed field Q does not vary with z: ! 2 2 x |Q| dx d 2 2 . (22) x  = 0, here x  = ! dz |Q|2 dx This can be considered also as an expansion of the field Q into the main self-similar part and a non-self-similar rest (treated as a small perturbation): Q(z, x) = Q0 (x) exp[i μ(z)] + Q1 (z, x) + ... with real Q0 . It can be shown following [35, 36] that the evolution of T (z) and M (z) is given by the formulas: d(z) c(z)N 2 dM dT , = 3 − = 4d(z)M, T2 T dz dz

(23)

where d(z) and c(z) are the same as in (1). This model is named TM-model. Equations (23) should be solved with periodic boundary conditions: T (0) = T (Lp ),

M (0) = M (Lp ).

Here Lp is a maximum value among Za and L. The amplitude constant N 2 is determined by the condition of periodicity of T and M . Varying T (0) we obtain the curves in the planes (N 2 , T (0)) and (M (0), T (0)) for the periodic solutions of (23). Now, we show that the ODE approach gives the reasonably good approximation of DM soliton characteristics, making this model useful for practical numerical simulations of periodic transmission lines. We consider a short-scale dispersion map with the parameters referred above. In Fig. 7 shows the results of the modeling based on the system (23). We have built up an evolution of DM soliton peak power dependence on the pulse width, with the change of the dispersion compensation length, while keeping the same average dispersion and the amplification distance. In Fig. 7 the dependence of DM soliton peak power on the pulse width at the beginning of the compensation section z = 0 is shown for different ratios of the dispersion period L = Za /J to the amplification distance Za , which is equal to 40 km: J = 10 (solid line), J = 1 (long-dashed line), J = 0.5 (dashed line), J = 0.2 (dotted line), J = 0.1 (dashed-dotted line). We show also the peak power dependence for the true DM soliton obtained numerically (in the full model) for J = 10 (squares) and J = 0.2 (rhombus).

M.P. Fedoruk

DJ Soliton Peak Power (mW)

186

J J J J J J J

16

=10 =1 =0.5 =0.2 =0.1 =10 PDE model =0.2 PDE model

12

8

4

0

20

40

60

Pulse width (ps)

80

Fig. 7. The dependence of DM soliton peak power on the pulse width at the beginning of the compensation section z = 0 for different ratios of the dispersion period L = Za /J to amplification distance Za : J = 10 (solid line), J = 1 (long-dashed line), J = 0.5 (dashed line), J = 0.2 (dotted line), J = 0.1 (dashed-dotted line). The same dependencies for the true DM soliton obtained numerically (in the full model): J = 10 (squares), J = 0.2 (rhombus)

5 Conclusions We have overviewed the mathematical models and algorithms for numerical modeling of periodic dispersion-managed fiber optic communication systems. The first model is nonlinear Schr¨ odinger equation (NLSE) with periodically varying coefficients. The second approach is the path-averaged model in spectral domain. The third model is the system of ordinary differential equations governing the evolution of the soliton width and chirp. Using the presented models we analyzed DM soliton solutions for several important practical systems. The author thanks S.K. Turitsyn, E.G. Shapiro, V.K. Mezentsev, S.B. Medvedev and E.G. Turitsyna for their continual support and cooperation. This research has been supported by President of Russian Federation grant No. NSh-2314.2003.1, Russian Ministry of Education grant No. ZN-080-01 and Siberian Branch of Russian Academy of Sciences integration grant No. 2.

Mathematical modelling of optical communication lines

187

References 1. Zakharov VE, Manakov SV, Novikov SP, Pitaevskii LP (1980) The theory of solitons. The Inverse Transform Method. Nauka, Moscow 2. Newell AC, Moloney JV (1992) Nonlinear optics. Addison-Wesley Publishing Company, Redwood City, CA 3. Hasegawa A, Kodama Y (1995) Solitons in optical communications. Clarendon Press, Oxford 4. Dodd RK, Eilbeck JC, Gibbon JD, Morris HC (1984) Solitons and nonlinear wave equations. Academic Press, New York. 5. Mollenauer LF, Mamyshev PV, Neubelt MJ (1996) Demonstration of soliton WDM transmission at up to 8×10 Gbit/s, error-free over transoceanic distances. OFC’96, San Jose, Post Deadline presentation, PD22-1 6. Smith N, Knox FM, Doran NJ, Blow KJ, Bennion I (1996) Electron Lett 32:54– 60 7. Suzuki M, Morita I, Edagawa N, Yamamoto S (1995) Electron Lett 31:2027– 2028 8. Gabitov I, Turitsyn SK (1996) Opt Lett 21:327-329 9. Georges T, Charbonnier B (1997) IEEE Photon Techn Lett 9:127–129 10. Nijhof JHB, Doran NJ, Forysiak W, Knox WM (1997) Electron Lett 33:1726– 1727 11. Smith NJ, Doran NJ, Forysiak W, Knox WM (1997) Opt Lett 21:1981–1983 12. Hasegawa A, Kodama Y, Maruta A (1997) Opt Fiber Technol 3:197–200 13. Zakharov VE, Manakov SV (1999) JETP Lett 70:573–578 14. Turitsyn SK, Shapiro EG, Mezentsev VK (1998) Opt Fiber Techn 4:384–402 15. Turitsyn SK, Shapiro EG, Medvedev SB, Fedoruk MP, Mezentsev VK (2003) Comptes Rendus Physique 4:145–161 16. Turitsyn SK, Fedoruk MP, Gornakova A (1999) Opt Lett 24:869–871 17. Turitsyn SK, Doran NJ, Turitsyna EG, Shapiro EG, Fedoruk MP, Medvedev SB (2000) Optical communication systems with short-scale dispersion management. In: A. Hasegawa (ed) Massive WDM and soliton transmission. Kluwer Academic Publishers, Dordrecht 18. Medvedev SB, Shapiro EG, Fedoruk MP, Turitsyna EG (2002) J Exp Theor Physics 121:892–900 19. Turitsyn S.K., Fedoruk MP, Forysiak W, Doran NJ (1999) Opt Commun 170:23–27 20. Turitsyn SK, Fedoruk MP, Shapiro EG, Mezentsev VK, Turitsyna EG (2000) IEEE J of Selected Topics in Quantum Electronics 6:263–275 21. Turitsyn SK, Turitsyna EG, Medvedev SB, Fedoruk MP (2000) Phys Rev E 61:3127–3132 22. Shtyrina OV, Medvedev SB, Fedoruk MP (2002) In: Proc. of Int. Conf. on Computational Mathematics, Novosibirsk 23. Medvedev SB, Shtyrina OV, Musher SL, Fedoruk MP (2002) Phys Rev 66 24. Agrawal GP Nonlinear Fiber Optics (2001) Academic Press, New York 25. Turitsyn SK, Gabitov I, Laedke EW, Mezentsev VK, Musher SL, Shapiro EG, Sch¨ afer T, Spatschek KH (1998) Opt Commun 151:117 26. Medvedev SB, Turitsyn SK (1999) JETP Lett 69:499–506 27. Gabitov I, Sch¨ afer T, Turitsyn SK (2000) Phys Lett A 265:274–281 28. Liang AH, Toda H, Hasegawa A (1999) Opt Lett 24:799–801

188

M.P. Fedoruk

29. Petviashvili VI (1976) Sov J Plasma Phys 2:257–280 30. Petviashvili VI, Pokhotelov OA (1992) Solitary waves in plasmas and in the atmosphere. Gordon & Breach, Philadelphia 31. Ablowitz MJ, Biondini G (1998) Opt Lett 23:384–386 32. Ablowitz MJ, Biondini G, Olson ES (2000) On the evolution and interaction of dispersion-managed solitons. In: A. Hasegawa (ed) Massive WDM and soliton transmission. Kluwer Academic Publishers, Dordrecht 33. Lushnikov PM (2001) Opt Lett 26:1535-1537 34. Blahut RE (1985) Fast algorithms for digital signal processing. Addison-Wesley Publishing Company, Reading, MA 35. Gabitov I, Turitsyn SK (1996) JETP Lett 63:861–864 36. Turitsyn SK, Shafer T, Spatschek KH, Mezentsev VK (1999) Opt Commun 163:122–158

Method of particles for incompressible flows with free surface A.M. Frank Institute of Computational Modelling SB RAS, Academgorodok, 660036 Krasnoyarsk, Russia [email protected]

Summary. A review of particle method and its applications to the simulation of different incompressible flows with a free surface is presented. The method is freeLagrange one and is a kind of special Galerkin method, where convective transport is being calculated by means of material particle motion. As a result, the material derivative is calculated in Lagrange variables, while the mass forces, inner and surface stresses — in Eulerian variables. The main advantages of the method are good accuracy and efficiency for smooth flows combined with the possibility to simulate complicated flows in arbitrary enough regions with moving and free boundaries, including the change of connectivity of the latter. The problems on surface waves, ball suspension by thin liquid jet, thin-film flows on a locally heated substrate are considered.

1 General equations A flow of an incompressible homogeneous fluid over a rigid boundary Γb with free surface Γ is considered (Fig. 1). The governing equations are: ρ

Du = divP + ρF, Dt divu = 0,

(1) (2)

DT = κ∇2 T, Dt P = −pI + 2μ(T )S,

(4)

Fc = Kσ(T )n + ∇Γ σ(T ),

(5)

σ(T ) = σ0 − σ1 T,

(6)

K = divΓ n = divn,

(7)

∇Γ = ∇ − n(n · ∇).

(8)

(3)

190

A.M. Frank

Fig. 1. The sketch of the flow

Below we give, in particular, the example of thermocapillary instability simulation in a heated film flow, therefore the governing equations include the heat equation and the surface tension, depending on a temperature. The D = ∂ + u · ∇ is the viscosity μ can also depend on a temperature. Here D ∂t t material derivative, ρF is the external body force, S is the strain rate tensor, T is the difference between the fluid temperature and the ambient air temperature, κ is the thermal diffusivity, Fc is the surface stress related to the boundary condition at the deformable free surface Γ (see Fig. 1), K is the doubled mean curvature of Γ , n is the outward unit normal to Γ , ∇Γ is the surface gradient. The boundary conditions at the free surface are given in Fig. 1, where b is the heat transfer coefficient at the free surface and cp is the specific heat of the liquid. We need a variational formulation of the problem in order to obtain the numerical method. Let Φ be a smooth, compact and divergence free function vanishing at Γb . Let us denote  % % A = supp Φ Ω, γ = ∂A Γ (a, b) = a · b, Ω

where Ω is a flow region. Due to the well known integral theorems, the boundary conditions and the symmetry of tensors P and S one can obtain:     div(P Φ) − P : ∇Φ = n · PΦ − P : ∇Φ = (divP, Φ) = A A ∂A A     Φ · Pn − P : ∇Φ = Φ · Fc − 2 μ(T )S : ∇Φ = = ∂A A γ A   Φ · Fc − μ(T )S : (∇Φ + ∇ΦT ), (9) = γ

A

where ”:” denotes the scalar product of tensors. Thus, the multiplication of equation (1) by Φ and the integration over Ω give

Method of particles for incompressible flows with free surface

(

Du , Φ) = Dt

 Γ

1 Fc · Φ + (F, Φ) − ρ

191

 ν(T )S : (∇Φ + ∇ΦT ).

(10)

Ω

One can similarly obtain the following relation from the equation (3):  b DT TΨ , Ψ ) = −κ(∇T, ∇Ψ ) − ( ρcp Γ Dt

(11)

for any smooth enough compact function Ψ , vanishing in that region of Γb , where the temperature is given.

2 Method of particles The method of particles for an incompressible fluid has been originally suggested for inviscid flows [1] and has been derived directly from Gauss’ variational principle of least constraint, rather than from Euler equations for an ideal fluid. The detailed description of the method and examples of its application to different flows with a free surface can be found in [2]. The extension of the method to viscous flows is also briefly outlined therein. Here we show how the method can be constructed for our case of a viscous flow with surface tension as an approximation of the equations (10)–(11). At t = 0 we fill up an initial fluid domain Ω0 with a large number of material particles. Each particle has a mass mk , a position rk , a velocity uk and a temperature Tk , and these discrete functions approximate the initial fields of velocity, temperature and unit density. We also use Galerkin method for the equations (10)–(11). At every time level n+1/2 the continuous velocity and temperature fields are found:   n+1/2 ¯ γ Φγ , ˜ n+1/2 = λα Φα + (12) λ u α

T n+1/2 =

 α

γ

ηαn+1/2 Ψα +



η¯δ Ψ¯δ ,

(13)

δ

where the basis functions Φα , Ψα possess the same properties as Φ, Ψ , mentioned above. The second terms in (12), (13) account for some boundary ¯ γ , η¯δ allows to impose the given inlet conditions when it is necessary. Setting λ velocity field and the bottom temperature distribution. The main advantage of the variational formulation here is that the basis functions Φα and Ψα do not have to satisfy any boundary conditions at the unknown free surface Γ . All these conditions for a fluid velocity and temperature are already naturally included into the equations (10)–(11). Using the terminology of finite element methods [3], these boundary conditions are natural in contrast to the principle ones, like u = 0 and T = T (x) at the bottom.

192

A.M. Frank

Since the fluid is incompressible, one can consider the volume integrals (scalar products) in (10)–(11) as being taken either in physical space over fluid region Ω, which varies in time, or in Lagrangian variables r(0, q) = q over the fixed region Ω0 . For any function F (t, r) we introduce a ”particle” approximation of the volume integrals in (10)–(11) as follows:   n+1/2 n+1/2 n+1/2 [F ] = mk F (t , rk ) ≈ F (tn+1/2 , r(tn+1/2 , q))dq. (14) k

Ω0

We also approximate the free surface Γ at each time level n+1/2 by a smoothing spline Γs , which is constructed through ! the known positions of surface particles. Therefore, any surface integral F can be approximated using one Γs

or another quadrature formula. This approximation is denoted by {F }n+1/2 . Thus, the following numerical scheme for the equations (10)–(11) can be employed: τ n−1/2 n+1/2 , (15) = rnk + uk rk 2  T n+1 − Tkn b n+1/2 {T Ψα }n+1/2 , (16) Ψα (rk ) = −κ[∇T · ∇Ψα ]n+1/2 − mk k ρcp τ k



mk

k

un+1 − unk n+1/2 k · Φα (rk )= τ

(17)

1 = { Fc · Φα }n+1/2 + [F · Φα ]n+1/2 − 2[ν(T )S : Sα ]n+1/2 , ρ n+1/2

un+1/2 (rk = (1 + β)˜ un+1 k

n+1/2

Tkn+1 = 2T n+1/2 (rk

) − βunk ,

(18)

) − Tkn ,

(19)

n+1/2

(20)

˜ n+1/2 (rk rn+1 = rnk + τ u k

),

where Sα = 12 (∇Φα + ∇ΦTα ), and 0 ≤ β ≤ 1 is a numerical parameter of the scheme (see below). Substituting (18), (19) into (16), (17) and using (12), (13), one obtains two linear algebraic systems for the unknown coefficients in the decompositions (12), (13): n+1/2 , (21) An+1/2 λn+1/2 = F1 n+1/2

B n+1/2 η n+1/2 = F2

.

(22) n+1/2

n+1/2

2τ [ν(T )Sα : Sβ ]n+1/2 , 1+β

(23)

Here λn+1/2 , η n+1/2 denote the vectors composed by all λα and ηα respectively. The matrices A and B have the following components: n+1/2

Aαβ

= [Φα · Φβ ]n+1/2 +

n+1/2

Bαβ

Method of particles for incompressible flows with free surface

193

τb τκ {Ψα Ψβ }n+1/2 . [∇Ψα · ∇Ψβ ]n+1/2 + 2ρcp 2

(24)

= [Ψα Ψβ ]n+1/2 +

Thus, the numerical algorithm at each time step looks as follows. First, intermediate particle positions are found from the predictor (15). Then the matrices and the right-hand sides in (21),(22) are calculated, and these systems are solved in order to find λn+1/2 , η n+1/2 . Finally, new continuous fields ˜ n+1/2 , T n+1/2 and new particle positions, velocities and temperatures are u calculated from (12), (13), (18)–(20). The method has a simple physical interpretation. First, let us consider an inviscid flow without gravity and surface tension. Among all divergence free velocity fields (12) we find from (17) that one, which gives a fluid particle acceleration orthogonal to all solenoidal basis functions. It can be easily shown [2] that it results in finding the divergence free field (12), which is the closest to the free motion velocity field in some discrete L2 norm. On the one hand, this is exactly what Gauss’ principle of least constraint states [4]. On the other hand, this method can be interpreted as a free motion of particles with subsequent projection (correction) of the velocity field onto some finite-dimensional subspace H of solenoidal functions. The similar procedure is well known in fractional step finite-difference schemes for the Navier-Stokes equations [5]. It has been shown [2] that the scheme (15)–(20) without gravity, surface tension and viscosity is unconditionally stable and conserves momentum for all 0 ≤ β ≤ 1. For the inner initial-boundary value problem for the Navier-Stokes equations (without free surface) the convergence for the solutions of (15) – (20) has been proved recently [6]. For β = 1 the time discretization is second order accurate, and the scheme conserves exactly the energy and the angular momentum. It is important that the equation (20) for the particle motion in a given divergence free velocity field is always of second order accuracy, as it is responsible for the residual in the incompressibility constraint |∂r/∂q| = 1. The point is that we calculate the fluid acceleration in Lagrangian variables, the forces – in Eulerian variables, and the equation (20) accounts for the transformation from the former to the latter ones. Thus, the method is a kind of combination of Galerkin method with the convective transport of quantities by particles. The use of particles gives a simple and convenient way of a free surface tracking. The computational examples given below show that this conservative combination of the particle approach and Galerkin approach allows, on the one hand, to obtain the accurate numerical solutions with very few basis functions for the smooth flows like regular surface waves, and, on the other hand, to simulate by the same method a complicated phenomenon of a rigid ball suspension by a thin liquid jet, where a free surface has a quite irregular shape. Galerkin approach, as it was shown above, permits also to take quite naturally into account the viscous and surface tension forces, which act on a flow due to the corresponding additional terms in the matrix and right-hand side in (21). The general scheme of the method described above leaves a certain freedom in choosing the basis functions Φα , Ψα . It is convenient to employ B-splines in

194

A.M. Frank

order to obtain the sparse matrices (23–24). But for the velocity field we need B-splines to be solenoidal and to satisfy non-slip boundary conditions on Γb (slip boundary conditions in inviscid case). Let us consider the problem on finding a vector potential for inviscid flow ˆ such that as a more general case. We are seeking a vector-function Θ ˆ u = rotΘ for solenoidal vector field u, whose normal component vanishes on surface S homeomorphic to a sphere. This problem (either inner or outer) is known to have a unique solution, provided that u and S are smooth enough and ˆ = 0, divΘ

ˆ τ |S = 0. Θ

(25)

ˆ by splines Due to the conditions (25) the approximation of the function Θ is not easier than that for the velocity field u. Therefore, let us consider the function ˆ + gradφ, Θ=Θ where φ is an arbitrary smooth function, satisfying the following conditions on S: ˆn . φ = 0, ∂φ/∂n = −Θ Such function always exists for a smooth enough S, e.g. as a solution of corresponding boundary-value problem for the biharmonic equation (we do not need to find this function, only its existence is of the matter). We get, as a result, that for any solenoidal field u the vector potential Θ (nonunique and nonsolenoidal) always exists, vanishing on S. In order to avoid the construction of B-splines on curvilinear grid (consistent with the surface S), one can find the vector potential Θ in the following form: ˜ (26) Θ = f Θ, where f is a smooth scalar ”shape function” vanishing on S. This function is not equal to zero in the fluid region Ω and has a nonzero bounded normal derivative on S. When e.g. Ω is an exterior of a ball (the flow around a sphere, see 3.4) this ”shape function” can be chosen as f = r2 − 1. It is not difficult to see that in general case of an arbitrary smooth surface S, ˜ is continuous up to the the representation (26) is always possible, where Θ surface function. In a case of a viscous flow, a ”shape function” f should have a zero normal derivative on S, but a nonzero second normal derivative. This ˜ should not will provide a non-slip condition to be satisfied. The function Θ obey any restrictions including boundary conditions, and, therefore, it can be approximated now by a B-spline on any grid including a simple rectangular one, which is in no way consistent with the surface S.

Method of particles for incompressible flows with free surface

195

Fig. 2. The results of the calculation at t = 6.5

˜ is vector, three independent functions As far as Θ ˜ i1 = (A, 0, 0), Θ

˜ i2 = (0, B, 0), Θ

˜ i3 = (0, 0, C), Θ

(27)

should be defined at every grid node, where A, B, C are scalar B-splines. The basis functions for velocity field in (12) can be obtained now as ˜ α ). Φα = rot(f Θ Sometimes, for example in 3.5, where the thin-film flow over a plane is considered, it is worth choosing the basis functions Θα to be B-splines in longitudinal and spanwise directions, but polynomials in transverse (normal to the plate) direction.

3 Numerical examples 3.1 Deformable ellipse The flows considered in paragraphs 3.1–3.4 are inviscid and the surface tension is neglected. The most detailed description of these flows can be found in [2]. The first one is the problem on the liquid ellipse deformation due to the initial velocity in the absence of a body force. This problem has an exact solution with a linear velocity field [7]: u = K(t)x, in the region

x2 α2

v = −K(t)y,

+ y 2 α2 ≤ 1, where K(t) = 2

α (t) α(t) ,

α =

2 √Bα , 1+α4

α(0) = A.

The equation αx 2 + y 2 α2 = 1 defines a free surface. The initial data are √ chosen A = 1, B = 2, therefore the initial region is a unit circle. Due to the symmetry only the quarter of the region is considered. In Fig. 2 the results of calculation are given at t = 6.5. For the sake of convenience the calculated

196

A.M. Frank

picture√is stretched and squeezed in Y and X directions respectively by the factor R, where R = α2 is an ellipse axes ratio. That is why it looks as a circle again, though here R is approximately equal to 100. The solid line shows the exact free boundary position. Two functions v1 (x, y) = (x, −y),

v2 (x, y) = (x2 , −2xy)

are taken as the basis of the space H. The time step τ = 0.1. It should be noted here that this problem, in spite of its apparent simplicity, provides usually some difficulties to the numerical methods due to the increase of the free surface curvature with time. For this method it appeared to be trivial because the velocity field remains linear and also there is no need to set the dynamic boundary condition at the curvilinear free surface. 3.2 Soliton The second problem is a solitary wave simulation. It is well known for the solitary waves that the horizontal velocity being constant across the fluid layer (and hence the vertical velocity is dependent linearly on the vertical coordinate y) gives a good approximation. That is why the basis functions vi (x, y) = rot(0, 0, Bi (x)y) are used here, where Bi (x) are B-splines. This choice of basis allows to reduce the problem to the one-dimensional one. Really, as far as the longitudinal velocity u ˜ doesn’t depend on y, all the fluid particles along any vertical line would go with the same velocity u ˜. Their vertical velocity could be uniquely determined by the velocity of the surface particle. Hence, it is enough to place particles only onto the free surface. Therefore, approximating the equation (10), one should replace the integral over x by the sum over all particles, while the integration over y can be made explicitly. In Fig. 3 the profile of the calculated solitary wave of the  amplitude a/d = 0.3 (d is the fluid depth) at the dimensionless time t g/d = 100 is presented. Here the B-splines Bi (x) of the third order are used, which are built on the uniform grid with mesh size h/d = 1. The initial distance between

Fig. 3. The propagating soliton. The solid line is the exact shape of a Rayleigh’s soliton of the same amplitude

Method of particles for incompressible flows with free surface

197

the particles is hp = h/3, the dimensionless time-step is τ = 0.5. Therefore, the numerical resolution is very coarse, and the time step is large. The  calculated dimensionless phase speed is C = 1.14186, which differs from 1 + a/d by 0.15%. It should be noted also, that if the totally conservative scheme is used (β = 1), then the numerical wave is a real soliton, which can propagate infinitely long without any loss of amplitude. 3.3 Traveling periodic waves The next example is a traveling periodic wave simulation. Here the waves of large amplitude in a fluid of infinite depth are calculated. The initial data have been set from the Schwartz’s solution [8]. This is a full two-dimensional problem and so the following basis functions vij (x, y) = rot(0, 0, Bi (x)Bj (y)) are chosen, where Bk (ξ) are B-splines of degree 2 on the nonuniform grid. The picture in Fig. 4 presents the results of progressive wave simulation up √ to dimensionless time t gk = 5.8 (about one wave period). Here k is a wavenumber, wave hight hk = 0.8, which is about 90% of that of the highest wave in deep water. The solid line is an exact free surface position from the Schwartz’s solution. In this run the grid has 20 × 10 cells (corresponds to a total number of basis functions), the number of particles is N = 2135, dimensionless time-step is τ = 0.2.

Fig. 4. The progressive wave in the deep water

Fig. 5. The breaking √ wave. The dimensionless time t gk values are shown in the pictures

198

A.M. Frank

As an example of a more complicated problem, Fig. 5 presents some results of a wave breaking simulation. Here the same basis function are used, their number is about 500, the number of particles N = 8470. The breaking is initiated by the nonstationary surface pressure action. 3.4 Ball in a jet Since 1870, when Osborne Reynolds studied the suspension of a ball by a vertical water jet, this effect became well known as a classical example of the interesting hydrodynamical phenomena. Still only few works concerning this problem appeared ever since (see references in [2]). The problem is hard both for analytical investigations and for direct numerical solving the Euler or Navier-Stokes equations, because it offers a problem of a body interaction with a rather complicated free surface flow. To simplify 3D calculations, the basis functions of minimal admissible smoothness are used here. The corresponding B-splines in (27) are of zero order along “its own” direction and linear along the other ones. For example, A = L0 (x)L1 (y)L1 (z). To estimate the accuracy, a 2D test problem on a potential jet flow around a rigid surface γ [9], having an exact solution, is solved. Fig. 6 demonstrates the comparison of flow geometry and velocity isolines for different numerical resolutions. In 3D simulations the fluid density ρj and the ball radius R are set to be unit. The coordinate system is Cartesian with Z-axis directed upward. In order to prevent the calculations from involving an extremely large number of particles, the computational domain around a ball is bounded by a sphere of

Fig. 6. 2D test problem - the jet flow around a surface. The solid lines are the isolines of the velocity absolute value from the exact solution

Method of particles for incompressible flows with free surface

199

the radius 2.5. The particles leaving this region are neglected. The rectangular uniform mesh with the mesh sizes h = 0.4 and h = 0.2 is used for the majority of calculations. Both mesh and computational domain are moving together with a ball. The inlet jet velocity V is being set uniform in a cross-section at a fixed distance 2R from a ball center. At the initial instant t = 0 it is set V = 1 and then its value is being changed according to the Bernoulli integral in the process of a vertical ball displacement. That is why the ball always finds its equilibrium state regardless of the initial Froud number value F r = 1/g.

Fig. 7. The initial stage of the jet flow around a ball

Fig. 8. The trajectories of the ball center motion in a horizontal plane

200

A.M. Frank

The initial configuration and the example of the first flow stage are shown in Fig. 7. As a disturbance at t = 0 a horizontal displacement X0 of a ball from a jet axis in a XZ-plane is given. All calculations are made until the ball reaches its almost stationary vertical position, the amplitude of oscillations stops to grow and the three consequent periods of oscillations are close enough. It takes usually about 6-10 periods. Some calculations are performed up to 25 periods.

Fig. 9. The dimensionless period of the ball oscillations versus the jet radius

Fig. 10. The sketch of a locally heated film flow

Method of particles for incompressible flows with free surface

201

Fig. 11. The time history for the 3D structure formation. The dimensionless (viscous scale) time values are given. Reproduced from [10] with the permission from Elsevier, Elsevier, 2003

As a rule, the ball motion goes off the plain of an initial displacement, i.e. the Y component of the velocity appears. Besides, the ball motion looks like a rotation with a trajectory, which is close to an ellipse with precessing axis. This can be seen in Fig. 8, where the trajectories of ball motion in XY -plane are plotted for the fixed ball density ρb /ρj = 1.014 and two different values of dimensionless jet radius. The vertical position of a ball is not steady either, it moves up and down around some equilibrium state. All this is also observed in a simple experiment, which was made to check the numerical simulations. Fig. 9 shows a comparison of the calculated (for two different meshes) and measured dimensionless period of the ball oscillations versus the jet radius.

202

A.M. Frank

Fig. 12. I (experiment): the Schlieren images [12] of the film surface at Re = 0.5 for the different heat flux values q: a - q = 0.3 W.cm−2 ; b - q = 0.63 W.cm−2 ; c q = 1.12 W.cm−2 ; d - q = 1.21 W.cm−2 , e - q = 1.5 W.cm−2 ; f - q = 2.11 W.cm−2 . O.Kabov et al. 2001. ) Reproduced from [12] with the permission of the authors, II (calculation): the calculated pictures of the film surface at Re = 0.5, Bi = 0 for the different heat flux values q: b - q = 0.63 W.cm−2 ; c1 - q = 1.07 W.cm−2 ; c2 q = 1.12 W.cm−2 ; d - q = 1.20 W.cm−2 ; e - q = 1.46 W.cm−2 ; f - q = 2.12 W.cm−2 . The dash lines show the heater region. Reproduced from [10] with the permission from Elsevier, Elsevier, 2003

Some more details and comparisons for this problem can be found in [2]. The physical mechanism of the stable ball suspension is also discussed there. 3.5 Locally heated falling film A liquid film over an inclined rigid plate is considered (see [10] for details). The flow of the viscous incompressible fluid is driven by the gravity and is also strongly affected by the thermocapillary forces due to a local heating at the bottom (Fig. 10).

Method of particles for incompressible flows with free surface

203

The main objective of this work was to simulate and investigate an effect of 3D bifurcation of the flow due to the thermocapillary instability, which leads to the 3D regular structure formation. This effect was recently discovered and studied experimentally [11], [12] (see more references in [10]). It was observed, that if a heating is low enough, only 2D flow with a bump at the front edge of a heater is present. For the larger heat flux this primary flow becomes unstable, and this instability leads to another steady or quasisteady 3D flow. The flow looks like a regular structure with a periodically bent leading bump and longitudinal rolls or rivulets descending from it downstream with a much thinner liquid film between them. Fig. 11 gives the example of the calculated time history of 3D structure formation. Fig. 12 shows the comparison with the experiment. The results of calculations [10] show the qualitative and quantitative agreement with the observations in several respects. The 3D instability takes place for the heat flux greater then a some threshold, and this value is well predicted by the numerical model. The predicted characteristic width of a structure quantitatively agrees with the observed one within the limits of the observation accuracy. It is shown that even large enough heat transfer at the interface slightly affects the structure width, that qualitatively agrees with the observation experience. The more fine features, like an increase in width of structure and a shift of the latter upstream with more intensive heating, are also captured in the calculations. The existence of some nonstationary regimes observed in experiment has been demonstrated as well. The non-zero heat transfer at the interface is shown to cause the additional rolls appearance between the main ones. The simulation revealed an interesting flow structure, including the strong contradirectional thermocapillary currents across the longitudinal roll crest, caused by the spanwise temperature gradients.

References 1. Frank A, Ogorodnikov E (1992) Sov Phys Dokl 37:489–491 2. Frank A (2001) Discrete models of incompressible fluid. Fizmatlit, Moscow (in Russian) 3. Strang G, Fix G (1973) An analysis of the finite element method. Prentice-Hall, Englewood Cliffs NJ 4. Appel P (1960) Theoretical mechanics. Fizmatgiz, Moscow (in Russian) 5. Temam R (1979) Navier-Stokes equations, theory and numerical analysis. NorthHolland, Amsterdam 6. Ovchinnikova E, Frank A (2004) Computational technologies 9(1):58–74 (in Russian) 7. Ovsyannikov L (1967) General equations and examples. In: Problems of unsteady motion of a fluid with free boundaries. Nauka, Novosibirsk (in Russian) 8. Schwartz L (1974) J Fluid Mech 62:553–578 9. Gurevich M (1979) Theory of jets for ideal fluid. Nauka, Moscow (in Russian)

204

A.M. Frank

10. Frank A (2003) Europ J Mech B/Fluids 22:445–471 11. Kabov O, Marchuk I, Chupin V (1996) Russ J Eng Thermophys 2:105–138 12. Kabov O, Legros J, Marchuk I, Scheid B (2001) Fluid Dynam 36:521–528

Direct and inverse problems in the mechanics of composite plates and shells S.K. Golushko Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia [email protected]

Summary. The various models of fibrous composites are considered: the filament’s model, the model with one-dimensional fibres, the specified model with onedimensional fibres, the model with two-dimensional fibres. The influence of models of a composite material on a stressed-deformed state composite constructions is demonstrated. The statement of problems of rational designing of composite plates and shells is given. A number of the analytical solutions of problems of rational designing of nodoid shells is received using of criteria of momentless of stressed-deformed state and equistressed reinforcing bars.

1 Introduction Composite materials (CM) are characterized by two levels of inhomogeneity: the microinhomogeneity connected to the presence of two phases (a matrix and filling material), and macroinhomogeneity, connected to the presence in a material differently oriented of focused microinhomogeneous layers. The main goal of the micromechanics of composites is the definition of effective modules of elasticity, i.e. the factors connecting averaged on volume stresses and deformations. There are two basic approaches to definition of these factors: that are phenomenological and structural. At the phenomenological approach the microstructure of a composite is practically ignored, the material is considered conditionally homogeneous, having some set of experimentally found constants. The structural approach is based on the analysis of CM according to its structure and mechanical properties of components. Considering the second level of inhomogeneity of CM, the preference should be given the structural approach as the phenomenological approach in this case cannot be realized. The structural approach besides allows after definition stressed-deformed state (SDS) of a construction to receive stresses in elements of a composition that enables to investigate local effects in binding and reinforcement, on border binding and reinforcing elements, to determine

206

S.K. Golushko

character of destruction, to put and solve problems of rational designing of constructions from CM. To present time a great number of micromodels of composites of a different level of complexity is developed which can be divided into the following groups: — the structural models which are taking into account physic-mechanical properties and volumetric contents of components, and directions of an arrangement of fibres; — the self-conjugated models where the composite is represented as one fibre surrounded with the infinite environment; — models taking into account the form of fibres and supposing their regular arrangement; — the power models allowing to receive the top and bottom borders for effective modules of elasticity; — the statistical models based on the assumption of casual distribution of fibres. The literature on the micromechanics of composites is rather extensive, we specify only some of books [1–15]. At the solving of direct problems of calculation of the SDS of real composite constructions and especially at the solving of inverse problems of their optimum and rational designing researchers are compelled to be guided by models which, describing the basic properties of composites, have at the same time the simplest kind, i.e. the models of the first type. The structural models [16–18] which are considered below satisfy to such requirements.

2 Structural models of composite materials Let’s consider structural model which equations are established at the following assumptions [16]:

Fig. 1. The polyreinforced layer

— the polyreinforced layer (fig.1) represents isotropic elastic homogeneous binding with the regular wire of unidirectional elastic isotropic fibres introduced into it ;

Direct and inverse problems in the mechanics of composites

(a)

207

(b)

(c)

(d)

Fig. 2. (a) The filament’s model; (b) the model with one-dimensional fibres; (c); the specified model with one-dimensional fibres (d) the model with two-dimensional fibres

— the number of reinforcing fibres is great enough, so the reinforced layer can be counted quasi-homogeneous; — reinforcing fibres have rectangular cross section and they are in conditions of ideal contact with binding; — introduced in binding reinforcing fibres perceive both stretching, and compressing efforts; — during deforming lengthening and shifts remain small in comparison with unity. Fig.2a corresponds to the filament’s model, fig.2b — to the model with one-dimensional fibres, fig.2c — to the specified model with one-dimensional fibres and fig.2d — to the model with two-dimensional fibres. Averaged stresses and deformations relationship in the k-th reinforced layer in a case of orthotropic material have the form: k = akαα ekαα + akαβ ekββ + akα3 ekαβ , σαα k k k = akα3 ekαα + akβ3 ekββ + ak33 ekαβ , τα3 = Gkα3 γα3 , σαβ

akαα = akz Eck +

Nk  nk =1

k anαα , akαβ = akz νck Eck +

Nk  nk =1

anαβk , (Gkα3 )−1 =

(1)

Nk  akz + g nk , Gkc n =1 α3 k

208

S.K. Golushko

akα3 =

Nk 

anα3k , ak33 =

nk =1 k anαα

anα3k

=

Nk  akz Eck + ank , (α, β = 1, 2; α = β), 2(1 + νck ) n =1 33 k

4 k An1111 lαn k

+

k 2(An1122

+

2 k 2An1212 )lαn l2 k βnk

4 k + An2222 lβn , k

2 4 4 k k k k + An2222 − 4An1212 )lαn l2 + An1122 (lαn + lβn ), anαβk = (An1111 k βnk k k & nk 2 ' 2 2 2 k k k = A1111 lαnk − An2222 lβn − (An1122 + 2An1212 )(lαn − lβn ) lαnk lβnk , k k k

2 2 2 k k k k + An2222 − 2An1122 )l1n l2 + An1212 (l1n − l2n )2 , an33k = (An1111 k 2nk k k & ' k = ωznk ωnk Eank + ωnc k Eck + δEck Eank (ωnk νank + ωnc k νck )2 χ−1 An1111 nk ,

(2)

k An1212 = δωznk Eck Eank ζn−1 , k

nk nk k nk −1 k = δωznk Eck Eank (ωnk νank + ωnc k νck )2 χ−1 An1122 nk , A2222 = δωz Ec Ea χnk , nk nk 2 nk 2 nk = g11 lαnk + g22 lβnk , g11 = 2δωznk (1 + νck )(1 + νank )ηn−1 , gα3 k nk g22 = δωznk ζnk (Eck Eank )−1 ,

χnk = ωnk [1 − (νank )2 ]Eck + ωnc k [1 − (νck )2 ]Eank , ηnk = ωnk (1 + νck )Eank + ωnc k (1 + νank )Eck , ' & ζnk = 2 ωnk (1 + νank )Eck + ωnc k (1 + νck )Eank , akz = akz = 1 − ωzk , ωzk =

Nk 

Eck akz k , , G = c 2(1 + νck ) 1 − (νck )2

ωznk , l1nk = cos ψnk , l2nk = sin ψnk , ωnc k = 1 − ωnk .

nk =1 k σαα ,

k σαβ ,

k k Here τα3 , ekαα , ekαβ , γα3 — components of k-th layer stresses and deformations tensors; Eck , Eank , νck , νank — Young’s modulus and Poisson’s factors of materials binding and n-th families of reinforcement in k-th layer; ωzk , ωnk , ωnc k — intensity of reinforcing in a surface and in a direction of thickness of a shell for n-th families of reinforcement and binding in reinforcing layer; ψnk — a corner of reinforcing n-th families of reinforcement in k-th a layer. The model with two–dimensional fibres [16] turns out at δ = 1 and using of all items in relations (2). At δ = 0 we have the specified model with one– dimensional fibres [17]; at δ = 0 and ωnc k = 0 we receive model with one– dimensional fibres [18], and at δ = 0, ωnc k = 0, akz = 0 — filament’s model [19].

Direct and inverse problems in the mechanics of composites

209

3 The basic system of the equations composite shells Problems of strength of CM were developed by many authors and have received wide illumination in the literature [7, 10, 22–30]. As well as at construction of models of CM, at an establishment of criteria of strength it is possible to allocate two basic approaches — phenomenological and structural. Within the framework first of them the CM is considered as quasi-homogeneous elastic environment for which the criterion of strength is postulated. The parameters which are included in its mathematical formulation, are found from experimental data. Among phenomenological criteria of strength the important place keeps tensor-polynomial criterion [25] which generalizes practically all known phenomenological criteria. However it is necessary to note, what even for rather simple kinds of the SDS it is required to realize rather laborious programs of experiments and mathematical processing to receive data. Other lack of such criteria is their formulation in terms of average stresses that does not allow to reveal the mechanism of occurrence of initial destruction and to predict a direction of its further development. The structural approach is free from the specified lacks. This direction is based on studying of stresses in elements of a substructure, for each of which this or that criterion of strength is accepted. After definition of average characteristics of the SDS in a construction, stresses in elements of CM are restored with the help of the equations of structural model. This way destroying intensity of external loadings of all elements of a composite are calculated, and least of them is accepted for loading of initial destruction. This approach allows to reveal an efficiency of binding and reinforcing elements, to specify rational parameters of reinforcing on strength. Let’s consider structural criterion of strength of fibrous CM which will be used below at the concrete calculations. Alongside with the assumptions formulated in the previous section, we accept the following two postulates: — adhesive strength binding not below cohesive one; — materials of binding and reinforcing elements submit to Mises conditions of strength (plasticity): c c c a a a , σαβ , τα3 ) = (σc∗ )2 , M (σαα , σαβ , τα3 ) = (σa∗ )2 , M (σαα

(3)

c c c a a a here σαα , σαβ , τα3 , σc∗ , σαα , σαβ , τα3 , σa∗ — accordingly components of stress and strength tensors of binding and reinforcing elements at a stretching; M (σαα , σαβ , τα3 ) — positively certain square-law form of these components: 2 2 2 2 2 M (σαα , σαβ , τα3 ) = σ11 − σ11 σ22 + σ22 + 3σ12 + 3τ13 + 3τ23 .

(4)

The scheme of the solving of a linear problem of the strength based on the given relations is the next. Let the considered plate or shell to be collected from K various layers reinforced with fibres and loaded with system of external forces, which intensity are proportional to scalar parameter P . By

210

S.K. Golushko

virtue of linearity of the differential equations and boundary conditions of the appropriate boundary value problem of a statics of plates and shells: c(k)

c(k)

c(k)

a(k)

a(k)

a(k)

c(k)

c(k) = P σ c(k) σαα αα , σαβ = P σ αβ , τα3 = P σ α3 , a(k) σαα = P σ a(k) αα , σαβ = P σ αβ , τα3

a(k)

= P τ α3 .

(5)

Further, from (3)–(5) it is possible to find loadings of initial destruction (k) (k) of binding Pc and reinforcing elements Pa k-th layer of a plate or a shell: ∗(k)

σc , Pc(k) =  c(k) c(k) c(k) sup M (σ αα , σ αβ , τ α3 ) Vk

∗(k)

σa

Pa(k) = 

a(k)

a(k)

a(k)

,

(6)

sup M (σ αα , σ αβ , τ α3 ) Vk

here Vk — volume occupy by k-th layer. Loadings of initial destruction k-th layer P (k) and multilayered composite plates or shells P ∗ are determined by relation (7) P (k) = min(Pc(k) , Pa(k) ), P ∗ = min P (k) . 1≤k≤K

Using of formulas Pc∗ = min Pc(k) , Pa∗ = min Pa(k) 1≤k≤K

1≤k≤K

(8)

loadings of initial destruction binding Pc∗ and reinforcing elements Pa∗ for composite plates or shells as a whole are determined. c(k) c(k) c(k) a(k) a(k) If for a linear problem of a bend stresses σαα , σαβ , τα3 , σαα , σαβ , a(k)

τα3 in all layers of a shell and in all components of a composite interpretable as (5), then for a nonlinear problem position varies. For this problem values are functions not only of coordinates, but also of loading parameter. It forces to address to more complex procedures of definition of loadings of initial destruction [1].

4 The basic system of the equations composite shells The full system of the equations, describing the SDS of thin-walled composite plates and shells, consists of three groups of relations: — the equations of balance or movement which do not depend on physical properties of a material of a construction; — kinematic relationship which are under construction on the basis of the certain assumptions about character of deformation (for composite constructions property of components of a material and its internal structure

Direct and inverse problems in the mechanics of composites

211

substantially influence on a choice of assumptions about character of deformation); — the physical relations connecting stresses and deformations in a construction and reflecting properties of a material it is made from. Let’s consider below three types of the reinforced shells:

(a)

(b)

(c)

Fig. 3. Types of the reinforced shells: (a) the multilayered reinforced shells; (b) the three-layer reinforced shells; (c) the three-layer polyreinforced shells

a) the multilayered reinforced shells of a regular structure (fig.3a), consisting of a great number of identical layers of the reinforcement introduced in isotropic material (so-called quasi-homogeneous shells); b) the three-layer reinforced shells (fig.3b), consisting of two reinforced layers of variable thickness δ1 , δ2 , divided by a layer of isotropic filler of variable thickness of 2H; c) the three-layer polyreinforced shells (fig.3c), representing a combination of shells of first two types, with various families of fibres of a various nature and various binding materials for each layer. Let xα , xβ — orthogonal system of coordinates, which coordinate lines — lines of curvature of a surface; Aα , Aβ — Lame’s parameters of this system; R1 , R2 — radiuses of curvature of normal sections in directions of coordinate lines. Considering that the normal to a surface is directed aside its convexity, for Lame’s parameters H1 , H2 , H3 of spatial orthogonal system of coordinates xα , xβ , z, normally connected with a surface, we have the following expressions: Hα = Aα ξα , H3 = 1, ξα = (1 + δ3 zkα ), kα = 1/Rα . The basic system of the equations describing balance of an shell of revolution, we shall write out including linear and nonlinear variants of classical theory of Kirchhoff — Love [20], theories of Timoshenko [21] and Andreev — Nemirovsky [1]. This system consists of the equations of balance

212

S.K. Golushko

∂(Aα Tβα ) ∂Aα ∂(Aβ Tαα ) ∂Aβ Tαβ + (δ1 + δ2 )Aα Aβ kα Qα + + Tββ + − ∂xβ ∂xβ ∂xα ∂xα   ∂(Aα Mβα ) ∂Aα ∂(Aβ Mαα ) ∂Aβ Mαβ − + Mββ + − +δ3 kα ∂xβ ∂xβ ∂xα ∂xα −δ4 Aα Aβ kα (Hαα ϑα + Hαβ ϑβ ) + Aα Aβ qα = 0,   ∂(Aβ Qα ) ∂(Aα Qβ ) − + Aα Aβ (kα Tαα + kβ Tββ ) − (δ1 + δ2 ) ∂xβ ∂xα    ∂Aα ∂Aα Mβα ∂Aβ ∂Aβ Mαα 1 ∂ Mαβ + + Mββ + − −δ3 ∂xβ ∂xβ ∂xα ∂xα ∂xα Aα $   ∂Aβ ∂Aβ Mαβ ∂Aα ∂Aα Mββ 1 ∂ Mβα + + Mαα + − + ∂xα ∂xα ∂xβ ∂xβ ∂xβ Aβ $ ∂ ∂ [Aα (Hβα ϑα + Hββ ϑβ )] + [Aβ (Hαα ϑα + Hαβ ϑβ )] + +δ4 ∂xβ ∂xα +Aα Aβ qn = 0,

(9)

 ∂(Aα Mβα ) ∂Aα ∂(Aβ Mαα ) ∂Aβ Mαβ − Aα Aβ Qα + + Mββ + − (δ1 + δ2 ) ∂xβ ∂xβ ∂xα ∂xα   ∂(Aα Sβα ) ∂Aα ∂(Aβ Sαα ) ∂Aβ  Sαβ − Aα Aβ Qα + qα = 0, + Sββ + − +δ3 ∂xβ ∂xβ ∂xα ∂xα K  hk k  τα3 [δ2 + δ3 f  (z)]ξα ξβ dz+ (δ2 + δ3 )Qα = k G h α3 k−1 k=1 K    hk  1 ∂Aα k 1 ∂μkαα k k k (μ − μαα ) ξβ dz+ + σαβ +δ3 σαα Aα Aβ ∂xβ ββ Aα ∂xα k=1 hk−1 (  K  hk   1 ∂Aβ k 1 ∂μkαα k k k (μ − μββ ) ξα dz , + σββ + σβα Aα Aβ ∂xα αα Aβ ∂xβ hk−1 

k=1

kinematic relations  vαk

= ξα uα + z(δ1 + δ2 )φα + δ3

λkα

 z ∂w k + μαα πα , v3k = w, − Aα ∂xα

k (δ1 + δ2 )(ϑα − φα ) + (δ2 + δ3 )γα3 =

ϑα = kα uα −

' δ3 & 0h τα3 (z) + f  (z)πα , k Gα3

1 ∂w , ek = 0, Aα ∂xα 33

Direct and inverse problems in the mechanics of composites

213

   1 ∂Aα k δ3 ∂λkα ξβ ∂Aα 1 ∂(ξα uα ) λ + + u β + Aα k α w + + Aβ ∂xβ β Hα ∂xα Aβ ∂xβ ∂xα Hα     1 ∂Aα (δ1 + δ2 ) ∂φα 1 ∂Aα k ∂  k φβ − + μββ πβ + z μαα πα + + Aβ ∂xβ ∂xα Hα Aβ ∂xβ ∂xα *( )   δ4 1 ∂Aα ∂w 1 ∂w ∂ δ3 + ϑ2α , + 2 2 Aβ ∂xβ ∂xβ Hα ∂xα Aα ∂xα (10)

ekαα =

    ξβ ∂Aβ 1 ∂(ξα uα ) ξα ∂Aα 1 ∂(ξβ uβ ) uβ + − uα + − = Aα ∂xα ∂xβ Hβ Aβ ∂xβ ∂xα Hα      φβ Aβ ∂ φα Aα ∂ − + +z(δ1 + δ2 ) Hα ∂xα Aβ Hβ ∂xβ Aα     1 ∂Aβ kβ ∂uα 1 ∂Aα kα ∂uβ uβ + − uα − − − Aα ∂xα Hβ ∂xβ Aβ ∂xβ Hα ∂xα      ∂λkβ 1 ∂Aα ∂w 1 ∂w ∂ 1 ∂Aα k δ3 + − λ −z − + A A ∂xβ ∂xα ∂xα Aβ ∂xβ Aβ ∂xβ α Hα ∂xα $ α β  1 ∂Aα k ∂  k μ πα + μββ πβ − + Aβ ∂xβ αα ∂xα     1 ∂Aβ ∂w 1 ∂w ∂ 1 ∂Aβ k δ3 ∂λkα + − λ −z − + Aα Aβ ∂xα ∂xβ ∂xβ Aα ∂xα Aα ∂xα β Hβ ∂xβ $  1 ∂Aβ k ∂  k μ πβ + δ4 ϑα ϑβ , μ πα − + Aα ∂xα ββ ∂xβ αα

2ekαβ

expressions for the generalized efforts and moments [Tαα , Mαα , δ3 Sαα ] =

K   k=1

[Tαβ , Mαβ , δ3 Sαβ ] =

k σαα ξβ [1, z, δ3 μkαα ]dz,

hk−1

K   k=1

hk

hk

k σαβ ξβ [1, z, δ3 μkββ ]dz,

hk−1

Hαα = Tαα + kα Mαα , Hαβ = Tβα + kβ Mβα , ⎡ ⎤ k−1  μkαα = ξα ⎣Iαk (hk−1 , z) + Iαj (hj−1 , hj )⎦ , μαβ = 0, j=1



b

Iαk (a, b) = a

f  (t)dt , Gkα3 ξα (t)

(11)

214

S.K. Golushko

⎡ λkα

= ξα ⎣Jαk (hk−1 , z) +

k−1  j=1

⎤ Jαj (hj−1 , hj )⎦ ,

 Jαk (a, b)

= a

b

0h τα3 (t)dt , Gkα3 ξα (t)

+ + K 0 0h 0 h 0 qα = τα3 ξα (h) − τα3 , qα = Aα Aβ τα3 μαα (h), τα3 (z) = τα3 + zh−1 (τα3 − τα3 ),    + + ∂A1 τ23 ∂A2 τ13 h + 0 + + − σ33 − δ3 qn = σ33 ∂x2 ∂x1 A1 A2 ' + + 0 0 − τ13 )ϑ1 + (τ23 − τ23 )ϑ2 . +(τ13

The system of the equations (1), (9)–(11) is system of the nonlinear differential equations in partial derivatives with variable coefficients. Its order does not depend on number of layers of a shell and is equal 12. For correct problem this order of system of the equations demands the six boundary conditions. Having put in these equations δ4 = 0 we receive the system of the linear equations, suitable for studying the SDS of plates and shells at small deflections. To classical theory of Kirchhoff — Love corresponds value δ1 = 1, δ2 = δ3 = 0; to theory of Timoshenko — value δ2 = 1, δ1 = δ3 = 0 and to non-classical theory of Andreev — Nemirovsky — δ3 = 1, δ1 = δ2 = 0. The solving of boundary value problems in the kind submitted above is rather inconveniently. That is why, usual practice at the solution of problems of the theory of round plates and shells of revolution are reduction of an initial boundary value problems to a number of boundary value problems for systems of the ordinary differential equations. For this purpose there are some basic approaches: division of variables on Galerkin’s method with application of trigonometrical basis, difference approximation of derivatives or spline-approximation of solutions on a circle variable [31]. At the solving of linear boundary value problems the method of division of variables with application of the trigonometrical basis, automatically providing performance of boundary conditions on circle coordinate more often is used.

5 Influence of models of composite materials on behavior of constructions The comparative analysis of influence of a choice of the structural model of CM on behavior of composite constructions we perform on examples axisymmetrical problems for layer-fibrous plates and shells of revolution. The three-layer reinforced round plate. We consider the three-layer round plate rigidly jammed on an internal edge, loaded with regular distributed external pressure q3 and stretching effort T0 on an external edge. In external layers the reinforcement is stacked under corners ψ1 and −ψ1 , and in internal under corners ψ0 and −ψ0 . On fig.5 relationships of intensity of the maximal reduced stresses in a plate from corners of stacking of reinforcement in external and internal layers

Direct and inverse problems in the mechanics of composites

(a)

(b)

(c)

(d)

215

Fig. 4. The relationships of intensity of the maximal reduced stresses in a plate: (a), (b), (c) – the values of the maximal reduced stresses in a plate or [16], [17], [18] models of CM; (d) the relative difference between the maximal stress for models of CM with one-dimensional and two-dimensional fibres

of carbon-plastic plates are shown (Ea /Ec = 100.) Calculations are spent at T0 = 103 N/m, q3 = 7 · 104 N/m2 . On fig.5a,5b,5c values of the maximal reduced stresses in a plate accordingly for [16], [17], [18] models of CM are shown. On fig.5d the relative difference between the maximal stress for models of CM with one-dimensional and two-dimensional fibres is given. The maximal relative difference between the maximal stresses calculated on specified model with one-dimensional fibres [17] and model with one-dimensional fibres [18] does not exceed 1%, while for model with twodimensional fibres [16] and model with one-dimensional fibres [18] it reaches 73%. For plates for which Young’s modulus of reinforcement and binding have values of one order, a picture of distribution of relative differences is another. In case of an aluminium plate with steel fibres (Ea /Ec = 2.9) the relative difference between the maximal stresses calculated on models [17] and [18], makes 8 − 10% while for models [16] and [18] the maximal relative difference reaches value about 50%. On fig.5a,5b,5c values of the maximal reduced stresses in a plate accordingly for [16], [17], [18] models of CM are shown. On fig.5d the relative difference between the maximal stress for models of CM

216

S.K. Golushko

(a)

(b)

(c)

(d)

Fig. 5. The relationships of intensity of the maximal reduced stresses in a plate: (a), (b), (c) – the values of the maximal reduced stresses in a plate accordingly for [16], [17], [18] models of CM; (d) the relative difference between the maximal stress for models of CM with one-dimensional and two-dimensional fibres

with one-dimensional and two-dimensional fibres is given. Calculations are spent at T0 = 105 N/m, q3 = 8 · 104 N/m2 . The spent numerical experiments have shown, that at the big difference between Young’s modulus binding and reinforcements the characteristic of the SDS, designed on one-dimensional models, differ a little bit. Reduction of those relation results in increasing of influence of additional components in the specified model with one-dimensional fibres. Comparison of characteristics of the SDS calculated on models with onedimensional and two-dimensional fibres have shown that there are areas of variation of parameters of reinforcing for which difference in results does not exceed 10% at various ratio of Young’s modulus binding and reinforcements. Nevertheless, there are other areas of variation of parameters of reinforcing at which difference in results can reach 50 − 70%. The three-layer reinforced cylindrical shell. We consider the three-layer reinforced cylindrical shell with the rigid bottoms under loading of constant internal pressure. The internal layer of a shell of thickness h1 is reinforced with circle family of reinforcement, an average layer of thickness h2 — spiral under corners ψ and −ψ, and an external layer of thickness h3 — longitudinal families of reinforcement; h1 = h3 = 0.1h.

Direct and inverse problems in the mechanics of composites

217

(a)

(b) Fig. 6. The maximal reduced stresses in longitudinal bs1 , spiral bs3 families of reinforcement and deflections carbon-plastic shells depending on a corner of spiral reinforcing: (a) the classical theory; (b) the nonclassical theory

On fig.6 the maximal reduced stresses in longitudinal bs1 , spiral bs3 families of reinforcement and deflections carbon-plastic shells depending on a corner of spiral reinforcing are submitted. The fig.6a corresponds the values designed under the classical theory, fig.6b — under the nonclassical theory [1]. Hereinafter a curve 1 corresponds the values designed on filament’s model [19], a curve 2 — model with one-dimensional fibres [18], a curve 3 — on the specified model with one-dimensional fibres [17] and a curve 4 — on model with two-dimensional fibres [16]. From fig.6 it is clear, that the results received on model with onedimensional fibres and the specified model with one-dimensional fibres practically coincide. Difference in the results received on models with one-dimensional and two-dimensional fibres at ψ > 450 does not exceed 10% as for classical and nonclassical theories. Using filament’s model brings an error in comparison with model with one-dimensional fibres up to 30%. The three-layer reinforced conic shell. We consider the three-layer reinforced conic shell with the rigid bottoms, subjected to the constant internal pressure. The internal layer of a shell of thickness h1 is reinforced with longitudinal family of reinforcement, an average layer of thickness h2 — circle and an external layer of thickness h3 — spiral families of reinforcement under corners ψ and −ψ.

218

S.K. Golushko

(a)

(b) Fig. 7. The dependencies of the maximal reduced stresses in binding bs0 and longitudinal family of reinforcement bs1 carbon-plastic shells (h1 = h3 = 0.1h) from a corner of stacking of spiral family of reinforcement

On fig.7 dependencies of the maximal reduced stresses in binding bs0 and longitudinal family of reinforcement bs1 carbon-plastic shells (h1 = h3 = 0.1h) from a corner of stacking of spiral family of reinforcement are submitted. From fig.7 it is clear, that the model with one-dimensional fibres and the specified model with one-dimensional fibres is yielded with practically identical results for all area of change of parameters, and in this case neglect of works of binding material in reinforcing layer brings an error which does not exceed 3%. However, full neglect of works of binding, i.e. using filament’s models of CM, results in significant distinction between results which, for example, for stresses in circle family of reinforcement make about 50% in comparison with model with one-dimensional fibres. Use of structural model with two-dimensional fibres essentially changes a picture of the SDS of a construction. For example, for stress in binding at ψ > 300 distinction between the results received on models with twodimensional and one-dimensional fibres grows with growth of value ψ and reaches at ψ close to 600 for the classical theory 50%, and for the nonclassical theory — 55%. For stresses in longitudinal family of reinforcement distinction makes from 20% up to 30%. “Sensitivity” of change of a corner of spiral reinforcing is more appreciably shown at use of model with two-dimensional fibres. The increase of a corner of spiral reinforcing results, in particular, in

Direct and inverse problems in the mechanics of composites

219

increasing of stress in binding in 4 times, and at use of model with onedimensional fibres only in 1.5 times. Influence of structural model of CM is shown in the much greater degree for materials at which Young’s modulus of reinforcement and binding have values of the same order, for example, for metal-composites.

(a)

(b) Fig. 8. The results for a conic shell with an aluminium binding and steel fibres

In particular, on fig.8 for a conic shell with an aluminium binding and steel fibres (h1 = h3 = 0.4h) the distinction between the results received on model with one-dimensional fibres and the specified model with one-dimensional fibres makes for stresses in binding — from 15% up to 25%, for stresses in longitudinal family of reinforcement — up to 30%. Difference in results at use of models with one-dimensional and two-dimensional fibres makes, in this case, from 30% up to 45% both for classical and for nonclassical theories. Using of filament’s models results in 200% − 300% of distinction from the results received on models with one-dimensional or two-dimensional fibres. Let’s note, that as against carbon-plastic a conic shell in this case dependence of the maximal reduced stresses in elements of CM from a corner of spiral reinforcing is not so brightly expressed. It was above shown, that the choice of this or that structural model of CM can affect a kind of the SDS of a construction very strongly. However, there are structures of CM at which differences between the results received at use of various models of CM are not so significant.

220

S.K. Golushko

We research influence of a choice of structural model of CM on behavior of a layered conic shell at various relations between mechanical characteristics of reinforcement and binding. On fig.9 the dimensionless intensity of stresses in binding σ0∗ , and the ∗ depending on parameter Ω = Ec1 /Ea1 , spiral families of reinforcement σ(3) Ea1 = Ea2 = Ea3 are submitted. The order of an arrangement and thickness ∗ = of layers correspond to parameters for 6, ψ = 600 . Thus σ0∗ = σ0 /P , σ(i) σ(i) /P , (i = 1, 2, 3), where σ0 , σ(i) — intensity of stresses in binding and i-th family of reinforcement accordingly.

(a)

(b) Fig. 9. The dimensionless intensity of stresses in binding σ0∗ , and the spiral families ∗ of reinforcement σ(3)

From fig.9 it is clear, that practically for all considered materials the model with one-dimensional fibres and the specified model with one-dimensional fibres is yielded with identical results both in classical, and in nonclassical cases, and only at Ω < 5, that there corresponds, for example, metal-composite, distinction between stresses in binding, received on these models can reach 15%. Using of model of a material with two-dimensional fibres results to completely other results. In case of classical and nonclassical theories distinction between the stresses received on models with one-dimensional and twodimensional fibres makes for binding from 40% up to 60%, for stresses in spiral family — from 20% up to 40%. And, the greater is distinction between val-

Direct and inverse problems in the mechanics of composites

221

ues of Young’s modulus of reinforcement and binding, the greater distinction between results.

6 About the statements and the solutions of inverse problems composite plates and shells The question of optimum designing thin-walled composite plates and shells in a complete formulation is extremely combined. Its complexity is caused by variety forms of shells used in engineering, a wide spectrum of requirements to them and the big variety of conditions of their working. Therefore researchers are compelled to consider the individual statements based on allocation of this or that requirement as basic for the subsequent solution of a problem of optimization. The most frequently used criteria of an optimality are requirements of a minimum of weight or a minimum of cost (when the material of a construction is non-uniform). The wide distribution in practice has received rational designing of thinwalled constructions. The most known criteria of rationality are requirements of equal-strength, equal-stress, equal-strain, momentless of the SDS, semi-rigid etc. For the reinforced constructions the most frequent criteria are the condition of equal-stress reinforcements, and also the requirement of coincidence of trajectories of reinforcing with the lines of the main stresses. The efficiency of criteria of rationality is caused by that in contrast to the common condition of a minimum of weight they directly write down through the parameters determining the SDS a construction that allows to simplify the statement of a problem of designing. Thus, cases when criteria of rationality result in an optimality of plates and shells are possible. From the point of view of rational designing, the ideal shell constructions are the constructions where momentless condition is realized, since the conditions of uniform on thickness of an shell of work of a material in this case are reached. For the reinforced shells the creation momentless projects becomes especially important as thus in them such lack as their weakened resistance to cross shift are removed also. Achievement momentless conditions without use of special measures probably only in unusual cases and at performance of the certain restrictions on character of loading, fastenings, reinforcing and the form of a surface of an shell. The important role in the theory of optimum and rational designing plays the concepts of equal-strength and equal-stress. Requirements of absence of reserves of strength and simultaneous destruction of all parts of a construction associated in practice with conditions of the minimal weight and accepted as criterion of an optimality. However the concepts of an optimality, equal-strength and equal-stress not always appear identical. Nevertheless, the principles of equal-strength and equal-stress have the important independent value as use of these principles allows essentially to simplify a problem of op-

222

S.K. Golushko

timum designing and to reduce it to the solution of some inverse problems of the theory of elasticity. Proceeding from the position according to which all loadings are perceived reinforcing material, and binding influences basically on uniform transfer of loadings on elementary fibres, as criterion of rationality for constructions from fibrous composites the requirement of equal-stress reinforcing fibres is frequently used. This criterion is rather natural from the practical point of view as opportunities of reinforcement in this case are used by the most full way. Let, proceeding from operational or economic reasons, L requirements to a shell to be imposed, and formulated as Φl (s, p, q) = 0, (l = 1, ..., L),

(12)

here s — the vector-function, describing the SDS of a shell; p — the vectorfunction determining geometry and thickness of a wall of the shell, structural and mechanical properties of CM; q — the vector-function of external influences. Relations (12) together with the equations (1), (9–11) make in case of the specified theory [1] system (15K + 25 + L) the equations, redefined about (15K + 25) unknown, included in the vector-function of a condition s. It is possible to try to provide solvability of this system of the equations due to the parameters which are included in the functions of designing p and loading q. Thus two ways are possible, at least. The first consists in short circuit of system of the equations (1), (9–11) by introduction L additional unknown of p and q and the subsequent numerical solution of nonlinear regional problems of system (15K + 25 + L) the equations with (15K + 25 + L) unknown, that results in significant mathematical difficulties. Second, more perspective way, consists in preliminary research of overdetermined of system of the equations (1), (9–11) and reception of conditions of its solvability after exception (15K + 25) unknown functions of its condition Ψl (p, q) = 0, (l = 1, ..., L),

(13)

on the basis of which it is possible to formulate a wide class of various statements of problems of rational designing. In most general view a problem of realization of rational the SDS of the reinforced shell we formulate as follows: to determine such laws of loading and changes of thickness, such structures of a composite material and the form of a median surface of a shell at which the conditions of solvability (12) are executed identically. The review and the analysis of approaches to the problem of rational designing of the reinforced shells is given in [32]. Some concrete results of this direction are published in works [33–52].

Direct and inverse problems in the mechanics of composites

223

7 Designing of nodoid shells The great practical interest cause the shells of revolution, having property of the maximal internal volume at the minimal area of a lateral surface. If thus it is necessary to provide an aperture of the given radius on an axis of an shell the hemisphere ceases to be best of shells. This property have nodoid shells, the equation of generatrix which in the parametrical form is [53] 



x = (2λ − R0 )F (k , θ) + R0 E(k , θ),  r = R0 1 − k  2 sin2 θ,

(14)

√ here k = (2λ − R0 )/R0 — the module of elliptic integral; k  = 1 − k 2 — the additional module; F (k  , θ), E(k  , θ) — the elliptic integrals I and II types; R0 — radius in a point of interface; θ — the current coordinate; λ — average radius of curvature for which next relations take place 2λr2 2λr2 1 1 1 , c = R02 − 2λR0 . , R2 = 2 , R1 = 2 + = r −c r +c R2 R1 λ Let’s consider a problem of designing of quasihomogeneous momentless nodoid shell reinforced circle and two spiral families symmetric concerning a meridian equal-stress of fibres, subjected to the constant internal pressure. The closed rigid covers are on the boundaries of the shell. Statement 1(h, ψ). The analytical solution of a problem of designing when the rational condition in a shell is provided due to a choice of the special law of profiling of thickness of a wall and a corner of stacking of spiral families of fibres, has the form: −1

2hε∗ = (T1 + T2 ) (2α + 2ω1 E1 + ω2 E2 )

, α = aE0 (1 − ν)−1 , −1

sin2 ψ = [α(T2 − T1 ) + 2ω1 E1 T2 − ω2 E2 T1 ] [2ω1 E1 (T1 + T2 )]

(15) ,

T1 = J(r sin θ)−1 , T2 = R2 q3 − J(R1 sin2 θ)−1 , θ rR1 (q3 cos θ − q1 sin θ)dθ + c0 .

J= θ0

Statement 2(h, ω1 ). It is possible to provide the solution of a problem of designing due to other pair functions: thickness of a wall and intensity of spiral reinforcing in order to get the momentless nodoid shells with equalstress families of circle and spiral reinforcement. The analytical solution in this case has the form 2E1 ω1 = [α(T1 − T2 ) + ω2 E2 T1 ] (T2 cos2 ψ − T1 sin2 ψ)−1 , 2hε∗ = (T2 cos2 ψ − T1 sin2 ψ)(α cos 2ψ + ω2 E2 cos2 ψ)−1 .

(16)

224

S.K. Golushko

Fig. 10. The curves of distributions of thickness and intensity of the spiral reinforcing

On fig.10 curves of distributions of thickness and intensity of the spiral reinforcing, appropriate to the analytical solution (15) are given. Thus curves 1 − 4 correspond the following values of parameters: 1 — E1 = 400 GPa, E2 = 80 GPa, ψ = 200 ; 2 — E1 = 300 GPa, E2 = 300 GPa, ψ = 100 ; 3 — E1 = 80 GPa, E2 = 400 GPa, ψ = 200 ; 4 — E1 = 400 GPa, E2 = 80 GPa, ψ = 480 . From fig.9 it follows, that there are good enough opportunities for technological realization of the received rational projects.

References 1. Andreev AN, Nemirovsky YuV (2002) Multilayered anisotropic shells and plates: The bend, stability, vibration. Nauka, Novosibirsk (in Russian) 2. Bolotin VV, Novitchkov YuN (1980) The mechanics of multilayered constructions. Mashinostroenie, Moscow (in Russian) 3. Van Fo Fy GA (1971) The theory of the reinforced materials. Naukova dumka, Kiev (in Russian) 4. Vanin GA (1985) The micromechanics of composite materials. Naukova dumka, Kiev (in Russian) 5. Vasil‘ev VV (1988) The mechanics of constructions from composite materials. Mashinostroenie, Moscow (in Russian) 6. Composite materials (1978): in 8 vol. Transl. from engl.: L Broutman, R Crock (eds), Vol.2: The mechanics of composite materials, J Sendeckyj (ed). Mir, Moscow (in Russian) 7. Vasil‘ev VV, Tarnopol‘skii YuM (eds) (1990) Composite materials: Referencebook Mashinostroenie, Moscow. (in Russian) 8. Christensen P (1982) Introduction in the mechanics of composites. Mir, Moscow (in Russian) 9. Malmeisters AK, Tamuzs VP, Teters GA (1980) Resistance of polymeric and composite materials. Zinatne, Riga (in Russian)

Direct and inverse problems in the mechanics of composites

225

10. Nemirovsky YuV, Reznikov BS (1986) Strength of elements of constructions from composite materials. Nauka, Novosibirsk (in Russian) 11. Pobedrja BE (1984) The mechanics of composite materials. MSU, Moscow (in Russian) 12. Skudra AM, Bulavs FYa (1978) The structural theory of reinforced plastics. Zinatne, Riga (in Russian) 13. Skudra AM, Bulavs FYa (1982) Strength of reinforced plastics. Khimiya, Moscow (in Russian) 14. Fujii T, Zako M (1982) The mechanics of destruction of composite materials. Mir, Moscow (in Russian) 15. Shermergor TD (1977) The theory of elasticity of microinhomogeneous environments. Nauka, Moscow (in Russian) 16. Nemirovsky YuV (1972) Mekhanika polimerov:861–873 (in Russian) 17. Nemirovsky YuV (1970) Dynamics of continuis media, Institute of hydrodynamics SB AS USSR, Novosibirsk 4:50–63 (in Russian) 18. Nemirovsky YuV (1969) Prikl Mekh Tekhn Phys 6:81–89 (in Russian) 19. Obraztsov IF, Vasil‘ev VV, Bunakov VA (1977) Optimum reinforcing of shells of revolution from composite materials. Mashinostroenie, Moscow (in Russian) 20. Novozhilov VV (1961) The theory of thin shells. Sudpromgiz, Leningrad (in Russian) 21. Grigorenko YaM, Vasilenko AT (1992) Problems of a statics of anisotropic nonhomogeneous shells. Nauka, Moscow (in Russian) 22. Vikario A, Toland R (1978) Criterion of strength and the analysis of fracture of constructions from composite materials. In: Composite materials: in 8 vol. / Transl. from engl. L Broutman, R Crock (eds), Nauka, Moscow. Vol. 7, Part 1:62–107 (in Russian) 23. Wu EM (1978) Phenomenological criteria of fracture of anisotropic environments. Composite materials: in 8 vol. Transl. from engl.: L Broutman, R Crock (eds), Vol.2: The mechanics of composite materials, J Sendeckyj (ed). Mir, Moscow (in Russian) 24. Goldenblat II, Kopnov VA (1968) Criteria of strength and plasticity of constructional materials. Mashinostroenie, Moscow (in Russian) 25. Malmeisters AK (1968) Mekhanika polimerov 4:519–534 (in Russian) 26. Nemirovsky YuV (1969) Prikl Mekh Tekhn Phys 5:81–88 (in Russian) 27. Bazhanov VL, Goldenblat II, Kopnov VA et al. (1968) Resistance of fiberglasses. Mashinostroenie, Moscow (in Russian) 28. Tamuzh VP, Teters GA (1979) Mekh komposit mater:34–45 (in Russian) 29. Tsai SW, Khan Kh (1978) The analysis of fracture of composites. Non elastic charateristics of composite materials. Mir, Moscow (in Russian) 30. Tsai SW, Wu EM (1971) J Compos Mater 5:58–80 31. Grigorenko YaM (1996) Some ways to the numerical solution of linear and nonlinear problems of the theory of shells in the classical and improved setting. Prikl mekh 32(6):3–39 (in Russian) 32. Golushko SK, Nemirovsky YuV (1988) The review and the analysis of approaches to a problem of rational designing of the reinforced shells, Krasnoyarsk (Prepr. CC SB AS USSR; No 16) (in Russian) 33. Nemirovsky YuV, Starostin GI (1971) Rep AS USSR 196(4):791–800 (in Russian) 34. Nemirovsky YuV, Reznikov BS (1976) Proc AS USSR, Solid Bodyes Mech 6:160–164 (in Russian)

226

S.K. Golushko

35. Nemirovsky YuV (1977) Proc AS of USSR, Solid Bodyes Mech 3:65–73 (in Russian) 36. Nemirovsky YuV (1978) To the theory of strict momentless elastic and thermoelastic shells. Solid body mech.: Proceedings of Poland-Soviet Symposium 1974. State science publ. house, Warsaw (in Russian) 37. Nemirovsky YuV, Golushko SK (1985) About rational winding of the reinforced shells of rotation. In: Engineering-physical coll. Publication of TSU, Tomsk (in Russian) 38. Golushko SK, Nemirovsky YuV (1988) About one approach to rational designing of the reinforced shells of rotation. In: Numerical methods of solving of problems of the theory of elasticity and plasticity. Proceedings of X All-Soviet Union Conference, Novosibirsk (in Russian) 39. Golushko SK, Nemirovsky YuV (1989) Designing of the semirigid reinforced shells. In: Spatial constructions in Krasnoyarsk region. Interhigh school coll. KPI, Krasnoyarsk (in Russian) 40. Golushko SK, Nemirovsky YuV (1989) Creating of projects of reinforced shell constructions of the minimal weight [in Russian]. Shokin YuI (ed). Computing problems of the mechanics: Interhigh school coll. Krasnoyarsk State University, Krasnoyarsk:117–130 (in Russian) 41. Golushko SK, Nemirovsky YuV (1990) Rational designing of the compound reinforced shells of revolution. In: Spatial constructions in Krasnoyarsk region. Interhigh school coll. KPI, Krasnoyarsk (in Russian) 42. Golushko SK (1992) Model Meas Contr B. AMSE Press 46(4):13–17 43. Golushko SK (1994) Analysis and design of thin-walled constructions of composite materials. In: Letunovsky VV (ed) Problems of Products Quality Assurance in Mashine-building. Proceedings of International-Technical Conference. KSTU, Krasnoyarsk 44. Golushko SK, Nemirovsky YuV (1997) Direct and inverse problems in the theory of the reinforced shells. In: Models of the mechanics of the continuous media, computing technologies and the automated designing in avia- and mechanical engineering: collection of proceedings. Kazan (in Russian) 45. Golushko SK (1998) Direct and inverse problems in mechanics of composite shells. In: Proceedings of The Sixth Japan — Russia Joint Symposium on Computational Fluid Dynamics. Nagoya University, Nagoya, Japan 46. Golushko SK, Nemirovsky YuV, Odnoval SV (1998) Dinamics of continious media 113:39–44 (in Russian) 47. Golushko SK, Odnoval SV (1998) Direct and inverse problems of calculation and designing of the reinforced domes. In: Mechanics of flying devices and modern materials: Proceedings of conference. Tomsk State University (in Russian) 48. Golushko SK, Gorshkov VV (2000) Calculation and designing of the combined vessels of pressure with equal-stress reinforcement. In: Proceedings of conference of young scientists, devoted to 10-annivercary of ICT SB RAS Vol. II (in Russian) 49. Golushko SK, Odnoval SV (2000) Efficiency of domes with equal-stress reinforcement [in Russian]. In: Proceedings of conference of young scientists, devoted to 10-annivercary of ICT SB RAS Vol. II (in Russian) 50. Golushko SK, Gorshkov VV (2000) The Method of the decision of direct and inverse problems of the connected reinforced tanks. In: Symmetry and the differential equations: Proceedings of international conference, ICM SB RAS, Krasnoyarsk (in Russian)

Direct and inverse problems in the mechanics of composites

227

51. Golushko SK, Gorshkov VV, Nemirovsky YuV (2000) Designing of tanks with equal-stress reinforcement. In: Problems of optimum designing of constructions: Collection of reports of III-rd All-Russian seminar in 2 vol. NSABU, Novosibirsk Vol. 2 (in Russian) 52. Golushko SK, Nemirovsky YuV, Odnoval SV (2000) Calculation and rational designing of equal-strain composite domes. In: Problems of optimum designing of constructions: Collection of reports of III-rd All-Russian seminar in 2 vol. NSABU, Novosibirsk Vol. 1 (in Russian) 53. Cherevatskii V, Grigor’ev A (1970) To research of nodoid and unduloid shells. In: Researches under the theory of plates and shells. Publication of Kazan University, Kazan Iss. 6–7 (in Russian)

Numerical simulation of plasma-chemical reactors Yu.N. Grigoryev1 and A.G. Gorobchuk2 1 2

Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia [email protected] Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia [email protected]

Summary. The interest to the investigation of processes in the plasma-chemical etching reactors (PCER) is defined by their wide-spread use in industrial production of semiconductor devices. The large number and complicated interconnection of factors that determine a quality and etching rate of wafers essentially limit the abilities of empirical optimization of PCER. The natural alternative is their optimization based on the mathematical modelling for which adequate numerical models are necessary. In the paper the results of development of numerical model of plasmachemical reactors based on Navier-Stokes equations in Boussinesq approximation are presented. The model contains the original elements essentially raising its prognostic abilities. In particular, it takes into account the infrared radiation of polyatomic molecules that substantially influence the temperature distribution in the reactors. The mass transfer of active particles additionally includes the process of thermodiffusion. The effects of medium rarefaction, adequate gas phase and heterogeneous kinetics are considered. The abilities of proposed model are illustrated by it applications to studies of processes in the reactors of different constructive schemes. The results of optimization of etching uniformity by mechanical protectors are considered. The influence of temperature nonuniformities and medium rarefaction on the quality of wafer processing are quantitatively estimated. The comparative investigation of commonly used chemical kinetics models in the PCER are presented. The optimum composition of parent gas mixture with respect to the etching rate are obtained. In conclusion some perspectives are discussed for further development of the model, in particular, the application to virtual plasma reactor in operating systems.

1 Introduction The growing rates of microschemes world production during the last two decades exceed essentially the corresponding indexes of any others production branches. Between 1997 and 2003 the consumer and communication electronics sales have grown from USD 744 billion to about USD 165 trillion. The present-day electronics is based on a silicon technology and such a state will be conserve at least during the nearest ten years. The low temperature

230

Yu.N. Grigoryev and A.G. Gorobchuk

plasma facilities – so called plasma reactors or glow discharge reactors, play an important role in technological processes of chip production. Such reactors are widely used for etching and deposition of semiconductor films, for taking off photoresist and some other operations. Very often they enter in complex cluster equipment for making chips. a

 ` ` ` ` ` `6 ` `  ` ` Q` ` ` ` ` ` ` Q ` `` ` `  ` ``  ` `` ` ` ` ? 8







7

7

5

b

 Q ?  Q ?  Q  Q 1   ZZ  Z $ 6 ?`` ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ?? ? ? ? ? ? ? ? ?? ? ? ?? ? ri ? Z ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` Zw0 ` ` L ` ` 6` ` ` 4 ` ` ` ` ` ` r`c ` ` ` ` ` ` ` ` ` ` ` C Q - 8 ` `` ` ` `` ` ` rp ` ` -C ` ` Q 3 ` ` ` ` ` r1 ` ` r1 ` ` ` ` ` `  `  6 `  ` hp Z ` ` ` ` ` `  `Z ` `- ` ` % Z  ? ra ro r2

J J J J

r2

2

2 c

6 ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` L ` ` ` `` ` ` ` ` ` ` ` ` ` ` ` ` `` ` ` ` ` ? 68





1

 ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` 6` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `  H H H3 `

7

2 H

?

6

z ?6r rt

- ri

ro

-

-

Fig. 1. The schemes of plasma - chemical etching reactor: a − “pedestal”, b − “stadium”, c − radial flow reactor. 1, 2 - RF - electrodes, 3 - processing wafer, 4 protector, 5 - feed gas, 6 - RF - discharge zone, 7 - inlet, 8 - outlet. The arrowed line shows the directions of the gas flow in the reactor

Numerical simulation of plasma-chemical reactors

231

Some characteristic schemes of these glow discharge reactors are presented in Fig. 1. The typical reactor consists of two parallel plate electrodes forming an axisymmetrical cylinder chamber, in which the high-frequency discharge is appeared. The processing wafer is placed on one of the electrodes. The originally inert feed gas enters the discharge zone where an active etchant species is produced by the electron - impact dissociation. The active species transfers to the wafer and reacts with it forming the volatile products. The unreacted feed gas and the products of physical-chemical processes and reactions are pumped outwards from the reactor. From the presented schemes one can see that these reactors are not very complicated and expensive apparatus. But yet in 1995 a world volume of sales of the reactors have made up USD 2 billion and it keeps on growing. This numeral can give us a rough idea about the quantity of operative reactors in modern industry. Despite of the relative simple construction the etching process in a reactor is a very complicating one. For silicon wafer operating the complex molecular gases such as CF4 , SF6 and their mixtures with oxygen O2 and hydrogen H2 are used. Under a microwave discharge and ion current in etching chamber a reacting medium appears that is characterized by simultaneously proceeding processes of ionization, dissociation, heat and mass transfer with complex chemical reactions. A similar processes take place on the surfaces of the chamber and wafer under operating. The quality and manufacturing rate of producing chips depend strongly on a large number of process variables in a reactor including parent gas composition, pressure, temperature, frequency and power of a discharge, flow rate and configuration, etc. Because of numerous and complex interconnections of the factors which defines the qualities of etching wafer, the opportunities of experimental studies and optimization of reactor process are very restricted. A natural alternative here is the mathematical modelling. It is especially necessary in respect to insufficient understanding of many real plasma physics and chemistry governing mechanisms that take place in this apparatus. By such a way there are economical, technical and scientific preconditions for the well-directed efforts in the development of mathematical modelling of plasma etching reactors.

2 Numerical model formulation In first turn some characteristic features of authors’ numerical model of plasma etching process will be described. The model was developed during several years [1] - [15]with successive improving its adequacy and prognostic abilities step by step. Today the created model corresponds completely to the world standards in mathematical modelling of plasma reactors and includes some novel elements.

232

Yu.N. Grigoryev and A.G. Gorobchuk

2.1 Gas flow and temperature distribution Under the typical operating conditions in plasma reactors the continuum approximation is valid, and gas flow is laminar, viscous and incompressible. Therefore, the steady Navier - Stokes system with heat transfer in standard Boussinesq approximation [1]was used for the stream description. The axisymmetrical statement of a problem is considered. In dimensionless variables “stream function–vorticity” the dynamic equations have the following form:   2 ∂ ω η ˜ 1 ∂ 2 ω η˜ 1 ∂ω η˜ ω η˜ ∂ω uω ∂ω + (1) − 2 + A2 + = − + wA u ∂ζ 2 ξ ξ ∂ξ Re ∂ξ 2 ξ ∂ζ ∂ξ  ∂u ∂ 2 η˜ 2 ∂w 2 ∂ 2 η˜ + − A A + ∂ζ ∂ξ 2 ∂ζ 2 Re ∂ξ    Gr 1 ∂θ ∂ 2 η˜ ∂w ∂u , − A −A + ∂ξ∂ζ ∂ζ ∂ξ Re2 Δθ ∂ξ ∂2ψ ∂ 2 ψ 1 ∂ψ + A2 2 + ξω = 0, − 2 ∂ζ ξ ∂ξ ∂ξ ∂u ∂w 1 ∂ψ A ∂ψ +A . , ω=− , w= u=− ∂ζ ∂ξ ξ ∂ξ ξ ∂ζ

(2)

Here ξ, ζ are the radial and axial coordinates; u, w are the velocity components; θ is the local temperature; Δθ is the characteristic temperature difference; Re is the Reynolds number; Gr is the Grashof number. For the velocity components on impenetrable walls the nonslip boundary conditions were used in range of operating pressures p = 0.1 − 1.0 torr and slip conditions in low pressures p = 0.01 − 0.1 torr correspondingly. The temperature distribution was obtained by solving the energy balance equation with heat transfer at the surfaces of reactors [6]:       ∂θ ∂ ∂θ 1 ∂ ∂θ ∂θ ˜ ˜ − ∇ · qr , (3) λ + A2 ξλ = + wA ρ˜c˜p Pe u ∂ζ ∂ζ ∂ξ ξ ∂ξ ∂ζ ∂ξ where cp is the constant pressure heat capacity; κ is the gas thermal conductivity; Pe is the Peclet number, qr is the radiation flow rate. The radiation flow qr under operating pressures p = 0.1 − 1.0 torr was calculated in thin optical layer approximation. The dynamical and energy balance equations were coupled through the temperature dependence of gas viscosity and the buoyancy term. The gas viscosity, thermal conductivity and heat capacity were considered as functions of temperature. The boundary conditions on the temperature expressed a balance of heat flows at the solid walls. Under low pressures a “temperature jump” condition was used.

Numerical simulation of plasma-chemical reactors

233

2.2 Physical - chemical kinetics and species concentration distribution In general case a binary mixture CF4 /O2 was considered as a parent gas [11], [12], because it is widely spreaded in silicon technology. The kinetic model included the following processes: electron-impact dissociation of binary gas mixture, volume recombination of reactive atoms and radicals, silicon etching, chemisorption of fluorine and oxygen atoms on Si surface, recombination and adsorption of CF 2 , CF 3 at wafer. The competition for chemisorption sites on the Si surface between oxygen and fluorine atoms was considered. The model contains 16 gas-phase reactions and 8 heterogeneous reactions on the wafer. The twelve products of dissociation and recombination processes – F , F2 , CF2 , CF3 , CF4 , C2 F6 , O, O2 , CO, CO2 , COF , COF2 are taken into account. The distribution of species concentration for each component was derived from the system of convective-diffusion equations. The gas phase reactions were incorporated in right-hand side of this system. The system takes the following form:      ∂ ln θ ∂xi 1 ∂ ∂ci ∂ci + (4) + kT ξ d˜i c˜t = + wA 2Pei u ∂ξ ∂ξ ξ ∂ξ ∂ζ ∂ξ    ∂ ln θ ∂xi ∂ ˜ + Gi , i = 1, . . . 12. + kT di c˜t + A2 ∂ζ ∂ζ ∂ζ Here ci , xi are the molar concentration and the molar fraction of species i; c˜t is the gas mixture molar concentration; d˜i is the binary diffusivity in CF4 /O2 ; kT is the thermal diffusion relation; Pei is the diffusion Peclet number of species i; Gi is the generation term of species i. The surface and silicon etching reactions entered the boundary conditions at the wafer. All of these have a sense of a balance of i - species flows. 2.3 Glow discharge structure and electron concentration The exact calculation of glow discharge structure demands the solving of the Boltzmann kinetic equation for the electrons in a mixture polyatomic gases and radicals. From both physical and computational points of view this is a formidable task [16]. Therefore in the parametric calculations some simplest model distributions of electron density in reactor were used. 2.4 Numerical method The presence of two-order elliptic operators in all equations of the mathematical model allows us to approximate each equation by implicit iterative finite difference splitting-up scheme with stabilizing correction. The scheme in general form looks as follows:

234

Yu.N. Grigoryev and A.G. Gorobchuk

φk+1/2 − φk = Lφξ φk+1/2 + Lφζ φk + F (φk ), τ φk+1 − φk+1/2 = Lφζ (φk+1 − φk ); τ The approximation order is O(τ + h21 + h22 ), where h1 , h2 are the mesh sizes along ξ and ζ coordinates, τ is the iterative parameter. The solution of the original steady state problem was derived by the relaxation method. The iterative process was terminated after achieving the relative error εφ = 10−10 − 10−4 in the uniform norm:    φk+1 − φk    max  < εφ . Ωh  φk+1  The equations (1), (2) for stream function and vorticity were solved together with the heat transport equation (3). The second-order Thom’s vorticity condition was applied to the flow problem. The stream function and gas temperature were found for each iteration of vorticity. The species concentrations were then calculated from the convective - diffusion equations (4) using the resulting velocity and temperature distributions.

3 Main results of plasma-chemical reactor modelling Here some principal results of application of created model to the investigations of plasma etching processes in glow reactors are presented. 3.1 Optimization of reactor design with respect to etching uniformity Firstly it will be demonstrated that the mathematical modelling even in frameworks of simplified model can give the results useful for technical applications. In [1]-[3] we considered two most spread PCER schemes - “pedestal” and “stadium” ones, which are used for individual etching of wafer with diameters up to 500 mm. They are shown in Fig. 1. The ends of cylinder chamber are employed as the electrodes between which the plasma RF-discharge is exited. The parent gas enters uniformly through upper electrode porous wall. The remains of feed gas and products of dissociation, recombination and etching are pumped outwards either radially (“stadium”) or in axial direction through circle gap on periphery of lower electrode (“pedestal”). It was noted that under operating in these industrial reactors an essential nonuniformity appeared at the outer edge of the patterns. As a consequence up to 30% of the initial wafer square went into a defective part. To minimize such

Numerical simulation of plasma-chemical reactors

235

Fig. 2. Isolines of stream function and full flow of etchant reactant in “stadium” type reactor without the protector. 1, 2 − distributions of diffusion and full flows of etchant in zone A. Processing regimes: p = 0.2 torr, Q = 30 cm3 /min, I + = 0 mkA/cm2 ; Re = 0.396, PeF = 0.143, Da1,F = 2.88

an edge nonconformity it was suggested to surround a pattern by a cylindrical protector with low etching reactivity. The problem was to obtain an optimal protector geometry. For this purpose the dimensions and operating parameters were taken in the range which is characteristic for industrial reactors. The etching process of silicon Si on a tetrafluoromethane plasma CF4 was chosen as a basic one. In parametric calculations the multicomponent gas medium in a reactor was considered as binary gas mixture consisted of etchant species – the fluorine F supporting the etching reaction on the wafer and the feed gas CF4 . In this case the active species concentration distribution was derived by solving a single convection-diffusion equation with the generation term describing the generation and depletion of active component in reactor volume. It was taken in the form: GF = ke ne (1 − 3CF ) − kv1 CF2 − kv2 CF2 , where ke is the rate constant of electron-impact dissociation of parent gas, kv1 , kv2 are the rate constants of volume recombination of the active species with radicals, ne is the electron density. To understand a mechanism of edge defect appearing and effect of protector the vector fields of fluorine flow densities were calculated: Qe = Qc + Qd , where Qe is the full flow density, Qc = CF v and Qd = −DF Ct ∇xF are the convective and diffusion ones correspondingly. The typical distribution of the full flow density Qe in “stadium” reactor without protector is presented in Fig. 2.

236

Yu.N. Grigoryev and A.G. Gorobchuk

Fig. 3. Etching rate as a function of radial position along the wafer in “stadium” type reactor for different protector radii and heights. Processing regimes: p = 0.2 torr, Q = 30 cm3 /min, I + = 0 mkA/cm2 ; Re = 0.396, PeF = 0.143, Da1,F = 2.88

One can see that near the outer edge the characteristic zone A exists where the relatively intensive diffusion of fluorine to the wafer takes place. It is connected with the large difference in etching reactivities of wafer and anode materials. The black markers single out the layer where ∂xF /∂ζ = 0 and the diffusion flow changes a sign. Consequently the etching nonuniformity in this case is defined by the nonuniformity of diffusion flow near the wafer edge. The parametric calculations for different values of height and diameter of a reactor were fulfilled. The results allowed us to choose the optimal sizes of protector. The graphs in Fig. 3 show a fluorine concentration distribution along the wafer for different protectors. The curve 3 corresponds to the optimal protector. Fig. 4 presents the full flows in “stadium” reactor with optimal protector. Thereat material of protector has a low reactivity. One can see from Fig. 4, that circle protector which has the same radius as the wafer interrupts completely the local diffusion flow arising from difference in reactivities of wafer and anode. Although near the top edge of protector zone B is appeared where the influence of low anode reactivity is preserved. Despite of these results, obtained for the very simple plasma chemical kinetics, the further investigations have shown that the optimal protector provides high etching uniformity for enough complicated models also.

Numerical simulation of plasma-chemical reactors

237

Fig. 4. The stream function isolines and distribution of full flow of etchant in “stadium” type reactor with optimal protector (rp = 38 mm, hp = 15 mm). 1, 2 − distributions of diffusion and full flows of etchant in zone B. Processing regimes: p = 0.2 torr, Q = 30 cm3 /min, I + = 0 mkA/cm2 ; Re = 0.396, PeF = 0.143, Da1,F = 2.88

3.2 Some extensions of etching reactor modelling Here some new effects obtained in our studies will be briefly commented. These effects were not considered before in the literature. Heat radiation transfer and thermodiffusion Plasma - chemical etching is usually related to the category of low temperature processes. Therefore, at the mathematical simulation of PCER one can limit himself by isothermal approach giving often the satisfactory results [1][3]. However the employment of low heatproof resists and thermosensitive polymers for the wafers requires thorough investigation of the heating of the processing chip and the elements of reactor construction. Also it is necessary to determine the heating influence on the processing quality. The main sources of heating effects in reactor are the heat generation on the wafer and surrounded electrode by the energetic ion bombardment, the plasma radiation, the heat effects of the exothermic reactions and glow discharge. The heat removal is realized by the complex heat transfer in reactor chamber and by the cooling system if such a system exists. Some nonisothermal effects in the reactor as a function of the electrode and wafer temperature were investigated. The temperature supposes as specified and varied in characteristic limits Tw = 300 − 500 K [4]-[7]. The heat radiation transfer in the gas was determined using the optical thin layer approximation. According to such an approach the source of heat radiation in (3) have the following form ∇ · qr = 4κp σT 4 , where κp is the Plank average absorption coefficient in the parent gas. The problem that we have obtained consisted in the absence of the necessary data about the emissivity of CF4 . Because the value κp was

238

Yu.N. Grigoryev and A.G. Gorobchuk

Fig. 5. The distribution of isotherms and full heat flow qh in “stadium” type reactor. Processing regimes: p = 1.0 torr, Q = 50 cm3 /min, Tw2 = 500 K, I + = 0 mkA/cm2 ; Re = 0.556, Pe = 0.581, Nu = 0.532, Gr = 60.347

estimated by the emissivity of methane CH4 having the same structure and the similar main vibration modes with the molecule CF4 . For calculation of the spectral absorption coefficient κν the exponential model of spectral band was used. The boundary conditions on the temperature were the balance of heat flows - convective, heat conduction and radiation ones. For example, at the wafer surface: −λ

∂T = α(Ta − T ) + σεw ε(Ta4 − T 4 ), ∂z 0 ≤ r ≤ r1 , z = 0,

where α is the heat transfer coefficient, Ta is the temperature of anode, σ is the Stefan-Boltzmann constant, εw is the emissivity of the wafer, ε is the emissivity of the feed gas. The boundary conditions for fluorine species took into account the diffusion and thermodiffusion of active species, its heterogeneous recombination and consumption in the chemical reactions of spontaneous and ion - induced etching. For instance, the boundary conditions at the wafer surface had the following form: D(

∂ ln T ∂x ) = ks x(1 − μx) + ki I + (1 − μx)/Ct , + kT ∂z ∂z 0 ≤ r ≤ r1 , z = 0,

where D is the binary diffusivity, kT is the thermodiffusion ratio, ks , ki are the constants of spontaneous and ion - induced etching. In calculations the vector fields of local density heat flow were analyzed. It is consisted of three components: qh = qa + qc + qr ,

Numerical simulation of plasma-chemical reactors

239

Fig. 6. The distribution of full flow of active species Qe in “stadium” type reactor and isolines of concentration (C × 10−10 , mol/cm3 ). Processing regimes: p = 0.2 torr, Q = 50 cm3 /min, Tw2 = 500 K, I + = 0 mkA/cm2 ; Re = 0.556, Pe = 0.581, Nu = 0.532, Gr = 2.414, PeF = 0.076, Da∗1,F = 0.47, ∗ ∗ β2,F = 2757.272, β3,F = 89.092

where qa = ρcp T v,

qc = −λ ∇T

are the densities of heat convective and heat conduction flows. The characteristic picture of full heat flow qh distribution is shown in Fig. 5. It was obtained that the main contribution in qh is given by the heat conduction and heat radiation flows in cylindrical volume over the substrate. In particular, near the wafer |qc |  |qr |, and they exceed convective flow qa over two order. Simultaneously qc and qr have the same order at the outlet in the middle part of reactor, but qa exceeds them by factor of 1.5 − 2. It is worth to note that if the heat radiation transfer is not taken into consideration the values of temperature on the isolines in Fig. 5 reduce approximately on 60 K. This allows one to conclude that despite of the temperature nonuniformities characteristic for PCER are relatively not large, the main heat transfer in the reactor realizes by the heat conduction and radiation. Therefore, a radiation heat transfer is necessary to take into account in numerical modelling. Under the thermal nonuniformity conditions the flow density of active species may be written as follows: Qe = Qc + Qd + Qt , where there are the convective flow Qc = Cv, diffusion one Qd ∼ −D∇x and thermodiffusion one Qt ∼ −DkT ∇ln T . The distribution of full flows Qe of etchant is shown in Fig. 6. Since the thermodiffusion ratio kT < 0, the vectors Qt direct along the gradients of temperature. The calculations show that |Qd |  |Qt | at the temperatures of lower electrode Ta = 400 − 500 K outside of zone limited by the protector, and they determine the value of full flow |Qe | substantially exceeding |Qc |. However immediately on the substrate |Qt |  (0.1 − 0.2)|Qd |. It

240

Yu.N. Grigoryev and A.G. Gorobchuk

means that the direct contribution of the thermodiffusion Qt to the etching processes gives 10 − 20% and under the local temperature gradients it may negatively affect the etching uniformity. These conclusions about the significant role of the heat radiation and thermodiffusion have a general character for typical conditions of plasma etching process that was supported by our calculations of another reactor schemes. Rarefaction effects The transition to low operating pressures in the limits p = 0.1 − 0.01 torr is the nearest perspective in progress of plasma chemical etching technology. In this connection it was interesting to investigate the influence of rarefaction effects, such as arising of the molecular transport rates, slipping of the flow and temperature jump on the surface of an operating wafer. As above the flow of gas mixture was described by the Navier - Stokes equations in Boussinesq approximation, but with boundary slip- and “temperature jump” conditions on reactor surfaces [5]-[7]. For instance, on the wafer these conditions were written as follows: u = 0.94

√ d˜F ∂xF ∂u 2−q + + (1 − 0.6q) (2 + 0.27q) Kn η˜ θ A 4 PeF ∂ξ ∂ζ q ˜ ∂ ln θ 1 λ , +0.96 (0.6 + q) Pe ρ˜c˜p ∂ξ θ − θw2 = 0.77

˜ √ ∂θ Kn λ 2−q , θA (2 + 0.32q) ∂ζ Pr c˜p q

where Pr is the Prandtl number, Kn is the Knudsen number, q is the accommodation coefficient, θw2 is the wafer temperature. Here it was assumed that in these low pressures the gas mixture is transparent and radiation is negligible. It is of interest to show the calculated radial velocity profile in “stadium” reactor with protector presented on Fig. 7. It is seen that the maximum value of slip velocity is approximately a half of flow rate in reactor, but it takes place far from a wafer. The results of these calculations allowed us to make the following conclusions. At lower pressures the heat transfer is carried out by heat convection and heat conduction in equal proportion. The amplitudes of diffusion flow density Qd and thermodiffusion one Qt are roughly equal near the wafer. In such a way the influence of temperature nonuniformities on the wafer quality increases. Despite of the essential slip of the flow velocity on the wafer the contribution of fluorine advection in etching is very small. Under the transition on low pressures the etching rates essentially decrease. !!! Such a diminution can achieve one - two orders.

Numerical simulation of plasma-chemical reactors

241

Fig. 7. The radial velocity profile in “stadium” type reactor. Processing regimes: p = 0.01 torr, Q = 50 cm3 /min, Tw2 = 400 K; Re = 0.663, Pe = 0.603, Kn = 0.058, Gr = 0.006

Effect of choice of plasma etching kinetics In foregoing decade the process of plasma chemical etching of silicon in CF4 was studied by many authors for different reactors and some variants of chemical kinetics. Unfortunately, different original data, geometrical configurations of reactors, numerous additional assumptions did not allow one to compare obtained results for choosing an adequate kinetic model. To do such a choice we have carried out a series of calculations of radial flow etching reactor that is a necessary aggregate in VLSI industry. The results obtained in the frameworks of one numerical model give us reliable data for comparison of wide used variants of Si/CF4 plasma kinetics with respect to etching rate [8, 9]. The scheme of radial flow plasma - chemical etching reactor is shown in Fig. 1. The feed gas enters the reacting chamber at the outer edge of lower electrode. RF - discharge is appeared between the electrodes. Products of physical - chemical reactions are pumped from outlet in the center of lower electrode. The processing wafer is placed on the lower electrode. The dimensions of reactor were taken from [17]. The operating conditions of reactor and process parameters were varied in the range that is characteristic for industrial one. In particular, it was chosen: the pressure p = 0.5 torr, the gas flow rate Q = 300 − 400 cm3 /min, the average electron density ne = 1010 cm−3 , the temperature of electrodes Tw = 300 K, the temperature of wafer Ts = 300 − 500 K. Since only a small fraction of tetrafluoromethane is dissociated under RF-discharge ( 1:

The Cusp kernel function possesses many advantages. It is defined on a compact carrier and restricts the interactions to a maximum distance equal to the smoothing length h. The kernel function and its deviations disappear at the outer boundary and the transition is steady and smooth. Thereby numerical effects such as oscillations are reduced. The second deviation is steady and makes the kernel functions respond relatively unsusceptible on high disorder of the particle distribution. The possibility to choose freely from a multiplicity of kernel functions makes the SPH method highly flexible.

264

S. Holtwick and H. Ruder

8 Implementation of the diesel injection Approaches made for the simulation of free surfaces were already introduced in 1994 by Monaghan. The SPH method is in principle able to describe the breakup of diesel jets. Elementary investigations on this application were described by Ott [3]. Examinations on the influence of inflow boundary conditions on the jet breakup at the beginning of the injection revealed that a realistic breakup behaviour can only be achieved by stochastic disturbance at the inlet. The viscous stress tensor was implemented in the SPH formalism to be able to simulate the complete Navier-Stokes equations. Ott has enhanced the SPH method to be able to simulate flows of two separated phases such as occur in the primary jet breakup. This enhancement has been examined by several test problems and proved stable even for huge density variations. Since the SPH method is able to simulate compressible flows the method is highly applicative to simulate the processes at extremely high injection pressures. 8.1 Boundaries For the treatment of fixed boundaries several specific approaches exist in SPH. For the simulation of diesel injection the border area modeled with virtual SPH-particles which serve as additional SPH sampling points and represent the boundary conditions. If no explicit boundary conditions are given the configuration of the virtual particles has to be gained from the formation of the SPH-particles in the internal space. In case of the reflecting boundary condition the virtual particles in the border area are arranged symmetrical to the SPH-particles in the internal space by mirroring the positions of the particles. This method corresponds to the imagination of a brassbound wall. 8.2 Computing expense A significant disadvantage of the SPH method compared to methods using grids are the by multiple times larger computing expense and memory requirements. A large contingent of the calculating time is needed by the interaction search. The information on interactions, particularly the values of the kernel function and its deviations, is needed several times during the course of the calculations and therefore are kept in the main memory. Because each particle medial has between 100 and 200 interaction partners the demand on main memory space for the interaction lists is enormous. Another reason for the huge computing expense results from the large ratio of particle speed and particle distance. Even for the fastest moving SPHparticle all interaction partners along its path have to be regarded. Each particle however is just able to see unto a certain horizon whose radius corresponds to the smoothing length. At a relative speed of the order of magnitude of one smoothing length per timestep the interactions can not be accurately

Simulation of diesel injection

Fig. 2. Distribution of the sampling points after 1600 ns at 400 ation

m s

265

without acceler-

considered. Therefore at high resolutions and large particle velocities the size of the timesteps has to be very small to ensure the accurate simulation of all interactions.

9 Simulation results 9.1 Distribution of the sampling points The following figure illustrates the distribution of the sampling points. Diesel particles are shown blue, air particles are shown red. Every individual sampling point is displayed. The regular patterns in both domains are not numerical effects but beats of the particle positions with the limited resolution of the illustration. 9.2 Pressure distribution The examination of effects at the order of magnitude of the smoothing length is problematic. The SPH-particles adjoin much closer and it does not make sense to evaluate the field quantities for every single particle. Therefore the field quantities are interpolated on a cartesian grid whose lattice parameter is of the same order of magnitude as the smoothing length. In the pictured simulation the inflowing diesel and the air inside the chamber have the same pressure of 50 bar. The illustration below reveals the compressional wave precursory to the diesel jet which flows in at about 60% of the sound-propagation velocity of the system.

266

S. Holtwick and H. Ruder pressure distribution at t = 1200 ns p 250 bar 200 bar 150 bar 100 bar 50 bar 0 bar 0 0 0.2 0.4 0.6 0

0.2 0.2

y (mm)

0.8

0.4

0.6

0.8

1.0

1.0

x (mm) 0

Fig. 3. Pressure distribution after 1200 ns at 50 bar and 400/400

m s

fluid jet speed

10 Current workings The current project is funded by the DFG within SFB 382 Verfahren und Algorithmen zur Simulation physikalischer Prozesse auf Hoechstleistungsrechnern (Methods and algorithms to simulate physical processes on supercomputers). In the past an implementation of the SPH method called sph98 has been developed at the University of T¨ ubingen. Additionally to the existing implementation another software package called sph2000 is being devised. 10.1 Development of simulation methods sph2000 is an object oriented library for parallel particle methods. The aim is to provide an easy to handle parallel platform for the implementation of other physical interactions and phenomenons. At the moment we are working on the extension of the physics implied in the method. Cavitation, surface tension and turbulence have to be implemented in modules for sph2000. The absence of these interactions actually make more realistic simulation of diesel injection yet impossible at this time. We are also working on concepts for a resource conserving way to enlarge the simulation area by one or two orders of magnitude. One of the approaches is to vary the density of the SPH particles within the simulation area.

Simulation of diesel injection

267

10.2 Comparison to other simulation methods In collaboration with several other working groups at the Universities of T¨ ubingen and Aachen the results of the simulations accomplished with SPHcodes are compared to the results of other methods. The working group at Aachen is employs a software called FLUENT that is based on the VOF method. The volume of fluid method is using a more or less static grid calculating the volume fraction of each grid cell by solving equations of transport at each timestep. FLUENT offers a simulation performance that is an order of magnitude faster than SPH but also has the main disadvantage that it is not able to simulate compressible flows. At the University of T¨ ubingen several particle and grid methods are developed and enhanced. Examples are PIC (particle in cell), FMM (finite mass method) and FEM (finite element method). Many test simulations of fluid jets and astrophysical problems were conducted to verify the results in comparison to other methods. 10.3 Research and development on computing performance Another part of our work is the research on computing performance concerning hardware solutions and parallelisation methods. For our simulations we are using a self-made Linux cluster based on standard PC components and a high performance Myrinet network. Other hardware is provided by the HLRS in Stuttgart. The parallelisation of SPH simulations is a challenge because of the very dynamic character of the method. Particles do not posses any static neighbourship relations causing a huge amount of communication between the computing nodes. New approaches for the parallelisation of object oriented codes on shared memory architecture had to be devised.

References 1. Speith R (1998) Analysis of SPH on the basis of astrophysical examples. Dissertation, University of T¨ ubingen (in German) 2. Flebbe O (1994) Smoothed Particle Hydrodynamics: Modelling of SuperhumpLightcurves. Dissertation, University of T¨ ubingen (in German) 3. Ott F (1999) Advancement and analysis of SPH as to simulate the breakup of free jets in air. Dissertation, University of T¨ ubingen (in German) 4. Gingold RA, Monaghan JJ (1982) Comput Physics 46:429-453 5. Hipp M, Kunze S, Ritt M, Rosenstiel W, Ruder H (2001) Fast parallel particle simulations on distributed memory architectures. In: Proceedings for the HLRS 6. Holtwick S (2001) Simulation of diesel injection using SPH on the Linux-Cluster Kepler. Diploma thesis, University of Tuebingen (in German) 7. Kunze S, Renz U, Schnetter E, Speith R (2000) Fluid Jet Simulations using Smoothed Particle Hydrodynamics. In: Proceedings of the 3rd workshop of the HLRS, Karlsruhe

Some features of modern computational mathematics: problems and new generation of algorithms Yu.M. Laevsky Institute of Computational Mathematics and Mathematical Geophysics SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia [email protected] Summary. Modern computational technology is based on a new generation of numerical methods and algorithms of solution of large-size problems of mathematical physics. Problems of approximation, iterative solution of linear systems, design of efficient preconditioners, solution of non-stationary problems are briefly reviewed in terms of the state-of-the-art.

1 Introduction The suggested paper does not pretend to be a survey of achievements of numerical mathematics as a whole, but represents a fairly subjective viewpoint on some of its aspects seeming to be the most important. And as it is not a survey, there are no references to original works. In a limited-size paper it is hardly possible to successively mention monographs, surveys or original papers on associated subjects. Nevertheless, the present discussion is based on analysis of recent reviews on the key areas of numerical methods for solution of partial differential equations (PDE). In particular, we broadly cite the surveys [1]-[4]. At the present time by large-size problems of mathematical physics the problems are meant, which discrete models have 106 and higher degrees of freedom. In this case, we are dealing with the models described by partial differential equations. The treatment of large-size problems has called for the basically new approaches, which in the end enable us to speak about the new generation of algorithms – the term conditional. It should be noted that for the problems in question, the questions of assimilation of source data become of primary importance as well as the questions of handling the obtained results, arrangement of information fluxes in the computational process, the ways of storing the information, etc., — all the factors, integrated in computer technologies. Here we will not dwell on the above-mentioned questions. When solving them, the key role belongs to the computer architecture underlying

270

Yu.M. Laevsky

modeling. Although there are also general technological principles which we should follow when developing computer programs. And, finally, one of the major requirements on algorithms of the new generation is the possibility of their efficient parallelization associated with the introduction of large multicomputer systems into computer practice. Further in our presentation we will stick to the following plan. First, we will discuss some variational problems and finite element method, after that a number of algorithms of iterative solution of a finite element system will be presented. In Sections 4 and 5 we will consider the idea of preconditioning and domain decomposition and the fictitious domain method as examples of the design of efficient preconditioners. Further the methods, which are based on the use of successive nested grids will be considered. For the solution of nonstationary problems we will describe some new approaches, which are indeed of the domain decomposition type of algorithms, and new multi-level explicit schemes. Finally, some conclusions will be given. The main features of any algorithm are its accuracy and efficiency. All our further considerations will be in this context.

2 Discretization and attendant statements of problems The finite element method (FEM) can be considered to be the major means of spatial discretization of mathematical physics equations. On the one hand, it is connected with transfer to the solution of realistic problems in the domains with, as a rule, rather complicated geometries, where essentially non-uniform non-structure grids are required. On the other hand, it is connected with a possibility to gain almost complete universality and automatization of constructing grid systems. In this case, the modern point of view, as far as FEM is concerned, does not exclude various deviations from the classical Galerkin method: diagonalization of a mass matrix (lumped FEM), application of analogues of directed differencies in approximation of convective terms, combination with finite volume method, discontinuous Galerkin method, mixed and hybrid FEM, etc. In terms of mathematics, FEM implies the possibility to formulate the original problem in the variational or in the projective form. A simpler projective formulation holds for problems with self-adjoint positive definite operator, generating the inner product and the norm in some Hilbert space H. In this case the boundary value problem for PDE is formulated as the problem of presenting the linear continuous form f in the space H: (u, v)H = f (v). The statement under study is not always apparent. For example, for the Stokes problem with natural boundary conditions the way to such a statement is to write down the equation of motion in the form:

Some features of modern computational mathematics

271

−div σ(u, p) = f , where σ(u, p) is a symmetric stress tensor σ(u, p) = η(grad u + (grad u)T ) − pI, I is a unit tensor, u is a velocity vector, p is a pressure. In the case of the conventional presentation of the equation of motion in the form: −ηΔu + grad p = f the progress is made only towards the Dirichlet boundary condition u = 0. The fact is, in the conventional presentation we “forget” about the term grad(div u), which is equal to zero by virtue of the non-compressibility condition div u = 0. Note that in the linear elasticity theory such a complication does not occur only due to the absence of non-compressibility condition, and the equation of motion has the form: −μΔu − (λ + μ)grad (div u) = f . In this case FEM is the problem of presenting a linear continuous form in the finite-dimensional subspace VN ⊂ H with local finite basis, which is usually, piecewise polynomial. From the viewpoint of analysis of errors, the FEM is the search for the orthogonal projection of solution of the differential problem onto a subspace. In recent years, a special attention is being given to the mixed FEM. The mixed FEM implies a new original statement allowing an increase in a priori smoothness of the sought for solution, and consequently, a decrease in dimension of the approximating subspace. It is well seen on an example of equations of motion of a viscous fluid. If, for example, velocity components in the Stokes problem with boundary condition u = 0 are sought for in the space of solenoidal functions (introduction of the non-compressibility condition into the definition of solvability space) H = {u ∈ (H01 (Ω))m , div u = 0}, then on the one hand we will obtain the generalized problem for the velocity vector, in which the pressure is excluded:   a(u, v) ≡ η grad u : grad vdΩ = f · vdΩ, v ∈ H, Ω

Ω

and, on the other hand, for the pressure we obtain the Poisson equation, the pressure is sought for as element of the space H 1 (Ω)/const. The approximation by FEM for such a statement brings about rather a complicated construction, as it is necessary, at least approximately, to satisfy the condition Vh ⊂ H. However, the Stokes problem can also be considered in another formulation: find the functions u ∈ (H01 (Ω))m and p ∈ L2 (Ω)/const such that

272

Yu.M. Laevsky

 a(u, v) +



Ω

v ∈ (H01 (Ω))m ,

f · v dΩ,

p div v dΩ = Ω

 q div u dΩ = 0,

q ∈ L2 (Ω)/const.

Ω

In this statement, the solenoidal property of the velocity field is no more the a priori requirement, but the result of problem solution, and the pressure is the square summed-up function. Approximation by the mixed FEM is done on the pair of the spaces Vh ⊂ (H01 (Ω))m and Sh ⊂ L2 (Ω)/const and results in an algebraic problem of the search for a saddle point      A B U F = P 0 BT 0 with symmetric, but not positive definite matrix. In this case the correctness of the presented system is a consequence of the Plasmodesmata–Babuˇska-Bretzi condition on the finite element spaces Vh and Sh : there exists a value γ > 0 independent of h such that inf sup

q∈Sh v∈Vh

b(q, v) ≥ γ. q 0 v 1

From the viewpoint of the saddle point algebraic system, this condition means that the matrix B T A−1 B is positive definite with the lowest eigenvalue independent of h. The mixed FEM as applied to the Poisson equation, biharmonic equation, etc., brings about quite a similar algebraic system. The only difference from the Stokes problem is in the necessity to preliminary obtain the mixed statement of the problem by introducing a vector variable. In particular, the Dirichlet problem for the Poisson equation has the following mixed variational form:   u · v dΩ + p div v dΩ = 0, v ∈ H(div; Ω), Ω

Ω



 q div u dΩ = Ω

f q dΩ,

q ∈ L2 (Ω),

Ω

where H(div; Ω) = {v ∈ (L2 (Ω))m , div v ∈ L2 (Ω)}. It is interesting that the Dirichlet problem has changed to the mixed problem with natural boundary condition. The mixed FEM is based on the nonconforming approximations of vector spaces with the use of vector elements. For example, the space H(div; Ω) may use well known nonconforming Raviar–Thomas type elements. The essence of

Some features of modern computational mathematics

273

the method is in exploitation of vector basic functions and tying of degrees of freedom not to the vertices of partitioning, but to faces (or to the edges for the Nedelec elements). For the face elements (the Raviar–Thomas type) normal components of basic vectors when passing from one element to another are continuous, and for the edge elements (Nedelec type) the tangential components are continuous. In the first case rot v = 0 on each element and in the second case div v = 0. The questions related to application of the nonconforming vector FEM to various classes of problems are at present being intensively studied .

3 Solution of linear systems Here we will dwell only on iterative procedures, offered and studied intensively in the last 10–15 years. As for the direct method (the LU-type factorization), there is an opinion that application of problems with higher than 105 degrees of freedom is too expensive, even for modern supercomputers. However, for the problems with irregular matrix structures with orders up to 104 − 5 · 104 (depending on the computer used) the preference should be given to direct methods. Among iterative methods the most frequently used methods were those of the relaxation type, such as Jacobi, Gauss–Zeidel, SOR, SSOR and their block modifications. Their convergence theory based on the concept of regular matrix splitting is well developed. Simultaneously with a relaxation technique there appeared methods based on the projection to the sequence of the Krylov subspaces. By the Krylov subspace of degree n is meant a span of vectors from RN : K n (A; r) = {r, Ar, . . . , An−1 r}, where A is a square N × N matrix. It is interesting to note that modern surveys do not generally discuss methods based on the information about the matrix spectrum. For the iterative solution of the linear system Au = f exploiting the Krylov subspaces K n (A; r0 ), where r0 = f − Au0 , and u0 is initial approximation, there are two major approaches: Approach 1. The next approximation uk is sought for by orthogonalization of residual to a current Krylov subspace: f − Auk ⊥ K k (A; r0 ). Approach 2. The next approximation uk is sought for by minimization of the Euclidean norm f − Auk on a current Krylov subspace K k (A; r0 ). For symmetric matrices within the first approach most widespread for a long time was the Lanczos method and in case of a positive definite matrix –

274

Yu.M. Laevsky

the one-step conjugate gradient (CG) method. To this group we may refer the FOM or the GENCG methods. The Bi-CG method additionally employing the Krylov subspace K k (AT ; s0 ) was suggested for solution of non-symmetric problems. However the latter is hardly reliable. Most popular methods of the second group are MINRES and GMRES. The algorithm M IN RES suits for solution of symmetric but indefinite systems and employs the same technique of constructing a basis, as CG. The method GMRES is based on the Givens rotation technique when reducing the Hessenberg matrix to the upper triangular form and is at present a basic means of solving non-symmetric systems. The algorithm GM RES(m), which restarts GM RES each m steps is of considerable current use. There are different modifications: FGMRES, GMRESR, etc. And, finally, widespread is the method Bi-CGSTAB as applied to non-symmetric systems. This method is a combination of the methods Bi − CG and GM RES(1). According to Golub and van der Vorst, the process of forming the new generation of linear algebra algorithms, based on the Krylov subspaces has been completed, although there are still a number of questions to be solved, such as the problem of iterations stop, development of parallel modifications. And the most urgent problem is constructing the efficient preconditioners.

4 Preconditioning It is well known that convergence of iterations depends on the condition number of the matrix of linear system. The systems obtained as discretization of PDE give a bad condition number, which depends on grid size, but is known, as rule. The preconditioning problem was clearly defined by Lanczos in 1952: “The construction of the inverse matrix is equivalent to a linear transformation which transforms the given matrix into the unit matrix. The unit matrix can be conceived as extreme case of a well-conditioned matrix whose eigenvalues are all 1.” The purpose of preconditioning consists in designing matrix B (preconditioner) which, in a sense, is close to the original matrix A (the matrix AB −1 is close to the unit matrix), and the operation B −1 r is realized efficiently (either by the direct method or by some inner iterative process using a special structure of B whose convergence is high in contrast to the matrix A). Among the methods of algebraic preconditioning, in which the matrix B is determined only by entries of the matrix A and not using the information about the original differential problem, most widespread is incomplete LU factorization for carrying out approximate LU decomposition with vanishing external to diagonal small entries of the upper and the lower triangular matrices. In case of the symmetric positive definite matrix A, the incomplete Choletsky factorization – used as preconditioner in the CG-method – yields good results for sufficiently large problems (ICCG). There are different versions of this method, such as the one with diagonal compensation.

Some features of modern computational mathematics

275

In the last few years the works appeared, where the matrix M = B −1 (i.e. the matrix needed in the process of computation) is sought for as sparse matrix making the value I − AM small for a certain matrix norm. As it was already discussed, the above approaches are of a purely algebraic character, not using the information about the original differential problem and the way of its discretization (i.e. the information about operator, domain, grid, etc.). The purposeful use of such an information essentially increases the possibilities of the search for efficient preconditioners, and by present, optimal preconditioners – in terms of iterative convergence rate – have been constructed for a number of problems. For the problems, after the discretization reducing to the algebraic linear system with a symmetric positive definite matrix, the basic notion (when constructing a preconditioner) is the spectral equivalence of matrices: the matrices A and B are spectrally equivalent if for all vectors u there hold inequalities α (Bu, u) ≤ (Au, u) ≤ β (Bu, u), where positive numbers α and β are independent of dimension of matrices. In this case, eigenvalues of the matrix AB −1 belong to a segment [α, β], and the problem with such a matrix becomes well-conditioned. Recently, the questions connected with the design of efficient preconditioners for the saddle point problems are intensively studied. Answers to these questions are of primary importance for problems of mechanics of fluids.

5 Domain decomposition and fictitious domain methods Domain decomposition methods (DDM) and, in a sense, their dual fictitious domain method represent an extensive part of modern numerical analysis, but not just a means of constructing efficient preconditioners for iterative processes. In this connection, only some most important ideas from this area will be stated below. Conceptually, the DDM is development and generalization of the Swartz method of alternation in subdomains. The Swartz method for solution of a problem of presenting a linear continuous form in the Hilbert space H may be formulated in the form of the following iterative procedure: uk+1/2 = uk − z k+1/2 , uk+1 = uk+1/2 − z k+1 ,

z k+1/2 ∈ H1 : (z k+1/2 , v) = (uk , v) − f (v) ∀ v ∈ H1 , z k+1 ∈ H2 : (z k+1 , v) = (uk+1/2 , v) − f (v) ∀ v ∈ H2 ,

where the space H is presented as a sum of two closed subspaces: H = H1 + H2 .

276

Yu.M. Laevsky

Let Pi be orthogonal projectors from H to Hi , in the inner product of the space H. Then uk+1 − u = (I − P2 )(I − P1 ) · (uk − u), and (I − P2 )(I − P1 ) < 1. This means that the iterative process has a geometric rate of convergence. An additive variant of the Swartz method (ASM) underlying the construction and analysis of the DDM with overlapping of subdomains for solution of a problem of presenting a linear continuous form in the finite-dimensional Hilbert space H = H1 + · · · + Hm can be formulated in the form of the following iterative process: k uk+1 = uk − τ · (z1k + · · · + zm ),

where zi ∈ Hi and for all v ∈ Hi (zi , v)Hi = (uk , v)H − f (v). In this case, the convergence of the process with the rate independent of dimension of the space H, is determined by the corresponding independence of constants from the conditions: for all elements v ∈ H there exists a presentation v = v1 + · · · + vm such that (v1 , v1 )H + · · · + (vm , vm )H ≤ γ (v, v)H ; and for all elements w ∈ Hi the following inequalities hold α (w, w)Hi ≤ (w, w)H ≤ β (w, w)Hi . In the matrix terms (u, v)H = (Au, v), Then

(u, v)Hi = (Bi u, v).

+ B −1 = B1+ + · · · + Bm

and the matrices A and B are spectrally equivalent. A good illustration of the ASD is a well-known Neumann–Dirichlet process for solution of the linear algebraic system corresponding to the Dirichlet problem for the Poisson equation in two rectangular subdomains. The matrix of the system has the following block form: ⎫ ⎧ A12 ⎪ ⎪ ⎪, ⎪A11 A = ⎩ ⎭ AT12 A22

Some features of modern computational mathematics

277

where blocks of the matrix A correspond to partitioning the a vector into two groups: the first group consists of variables in inner grid vertices of the first subdomain and at the interface between subdomains and the variables in inner grid vertices form the second group. Let us consider the matrix ⎫ ⎧ AT12 A12 ⎪ A1N + A12 A−1 ⎪ 22 ⎪ ⎪ B = ⎩ ⎭, AT12 A22 where the matrix A1N corresponds to the Neumann problem in the first subdomain. The matrix B is readily invertible: ⎫ ⎧ 0 0 ⎪ ⎪ T ⎪ ⎪ T + B −1 = T A−1 ⎩ ⎭, 1N 0 A−1 22 where

⎧ ⎪ T = ⎪ ⎩

⎫ I11 ⎪ ⎪ T ⎭ −A−1 22 A12

is the operator of a harmonic extension. Then we can assume ⎫ ⎧ 0 0 ⎪ ⎪ + −1 T + ⎪ ⎪ B2 = ⎩ B1 = T A1N T , ⎭. 0 A−1 22 It is known that the operators A and B are spectrally equivalent. For non-overlapping subdomains the ASM is used, where in each subdomain the Dirichlet problem is considered, and for the variables on a set of interfaces there is considered a space of extensions of grid functions from the interfaces into interiors of subdomains, i.e. H = Hint + HΓ ,

Hint = H1 ⊕ · · · ⊕ Hm ,

where HΓ is the space of extensions of grid functions. In the vector-matrix terms, a linear system for the above example (m = 2) reduces to the equation with the Shur complement as matrix of the system: −1 T S = A00 − AT10 A−1 11 A10 − A02 A22 A02 .

A continuous analogue of such a system is an integral equation, and the matrix S is a grid approximation of the Poincare–Steklov integral operator. There are many publications on designing preconditioners for this system. A dual approach to the domain decomposition is the fictitious domain method, in which the original problem in a “bad” domain is changed by the problem in a larger “good” domain, and the preconditioner for iterative solution to the original problem is designed as simple, easily invertible operator in this “good” domain. Modern interpretation of the fictitious domain method is most completely represented in the approach generally called the fictitious space method (FSM), which may be formulated as follows (Nepomnyaschikh, 1990):

278

Yu.M. Laevsky

H0 and H are Hilbert spaces with inner products (u0 , u0 )0 and (u, u), A : H0 → H0 ,

B:H→H

are linear self-adjoint positive definite operators, R : H → H0 ,

T : H0 → H

are linear operators such that the operator RT : H0 → H0 is identical for all u0 ∈ H0 and u ∈ H and (ARu, Ru)0 ≤ cR (Bu, u),

cT (BT u0 , T u0 ) ≤ (Au0 , u0 )0 ,

where cR and cT are positive numbers. Then cT (A−1 u0 , u0 )0 ≤ (RB −1 R∗ u0 , u0 )0 ≤ cR (A−1 u0 , u0 )0 , where R∗ : H0 → H is the operator adjoint to R. Let us consider a simple illustration of this statement for which the FSM is the fictitious domain method. In the grid domain Ωh we need to solve the grid Neumann problem for the Poisson equation AN u = f, where AN : H0 → H0 . Let us introduce a rectangular matrix R = (IN 0), where IN is the unit operator in H0 . This operator is defined in the space H of the grid functions set on a simple domain Πh ⊃ Ωh and. Then let T = (IN t)T , where t is an extension operator from H0 to H, and RT = IN . Then the preconditioner has the form BN = (RA−1 RT )−1 , where A : H → H is a readily invertible operator for the functions set in a “good” domain Πh , and it is known that the operators AN and BN are spectrally equivalent. The latter means that if ⎫ ⎧ C12 ⎪ ⎪ ⎪, ⎪C11 A−1 = ⎩ ⎭ T C12 C22 −1 . then BN = C11

Some features of modern computational mathematics

279

6 Multigrid and multilevel methods Multigrid methods (MG) are most effective means of solution to the PDE. These methods are based on computation with the use of a sequence of nested grids ωh0 ⊂ · · · ⊂ ωhJ , giving rise to a sequence of embedded finite-dimensional spaces Vh0 ⊂ · · · ⊂ VhJ and a sequence of the operator equations Aj uj = fj ,

uj ∈ Vhj ,

j = 0, . . . , J.

Back in the 60-s, the classical MG started their development, and their essence is in the following recurrent procedure: – at the j-th level, m iterations for the j-th system, which set the approximate solution vj and the residual rj , are carry out; – the residual rj ∈ Vhj is mapped onto the space Vhj−1 : rj−1 = Rj rj ; – based on the procedure in question, an approximate solution w /j−1 to the system Aj−1 wj−1 = rj−1 is found; /j = Ij w /j−1 ; – the error w /j−1 is interpolated to the j-th level: w /j and starting with v/j , as initial – at the j-th level, assuming v/j = vj − w approximation, l iterations for the j-th system are carried out (suppression of the interpolation error). The so-called V-cycle was described above. Also widespread is its modification – W-cycle. A simple version of the classical MG is the cascade technique, consisting in the interpolation procedure with iterative smoothing, has received theoretical grounds only recently. The “correct” number of iterations on each level underlies the cascade method. Since the 80-s, the present view about the MG as multilevel methods started to form. These methods are based on FEM with the use of a hierarchical basis. Introducing ordinary interpolating operators Πj : VhJ → Vhj ,

j = 0, . . . , J,

we make use of the representation Π J = R0 + · · · + RJ , where R0 = Π0 ,

Rj = Πj − Πj−1 ,

j = 1, · · · , J.

Taking into account the fact the operator ΠJ is a unit operator in the space VhJ , the following expansion is valid uJ = v0 + · · · + vJ ,

280

Yu.M. Laevsky

where vj = Rj uJ . This means that the ASM may be used, and B0+ = R0T A−1 0 R0 ,

Bj+ = RjT Dj−1 Rj ,

j = 1, · · · , J,

where Dj = diag(Rj ARjT ) and the preconditioner for solution of the system on the upper level has form B −1 = B0+ + · · · + BJ+ . For this method the spectral condition number cond (B −1 A) depends on hJ . Another method, free from this disadvantage, makes use of the hierarchical basis – BPX, whose rate of convergence is independent of hJ as well as of J. For the BPX the operators Rj are defined as R 0 = Q0 ,

Rj = Qj − Qj−1 ,

j = 1, · · · , J,

where Qj are L2 -projectors: Qj : VhJ → Vhj , In this case

(Qj u, v)L2 = (u, v)L2

RjT Rj , Bj+ = h2−d j

∀v ∈ Vhj .

j = 1, · · · , J,

−j

where hj = 2 h0 , d is dimension of the original differential problem. Note that multilevel methods, based on the ASM, are additive versions of the classical MG techniques, hence they are easily parallelized. Currently, according to J. Saad, the users of the iterative technique can be conditionally divided into two groups, namely, those who advocate exclusively the MG algorithms and those who supports purely algebraic methods in the Krylov spaces. There are some compromise settlements (AMLI), however they are rare. The methods in the Krylov spaces can be referred to the generalpurpose ones. However if we speak about a discrete analogue of the PDE, for which the MG technique is available (a smooth solution, a regular grid), its application is more efficient.

7 Methods of solution of non-stationary problems Introduction into computer practice of large non-stationary problems on nonstructural moving grids with the varying in time geometry essentially complicates, sometimes even making it impossible, the application of conventional methods of splitting with respect to the coordinates. Currently, there are usually used FEM in spatial variables in combination with a certain implicit scheme with respect to time and its iterative solution using a proper preconditioner. Recently, of considerable current use are the studies associated with the iterative-free DDM. The methods, based on the splitting schemes over subdomains (but not along the coordinate directions) have been developed and

Some features of modern computational mathematics

281

studied. In the process, there were considered schemes both with overlapping and without it. And, finally, quite recently, an iterative-free version of the ASM has been proposed. Let us present simple versions of such algorithms. In the case of the overlapping subdomains Ω1 + Ω2 = Ω the following additive representation is used    ∇u · ∇v dΩ = λ1 ∇u · ∇v dΩ + λ2 ∇u · ∇v dΩ, Ω

Ω

Ω

where {λi (x)}i=1,2 is a smooth partitioning of unit, i.e., λ1 + λ2 ≡ 1 in Ω and λi = 0 in Ω \ Ωi . For non-overlapping subdomains the following additive representation is used    ∇u · ∇v dΩ = ∇u · ∇v dΩ + ∇u · ∇v dΩ. Ω

Ω1

Ω2

In both cases we deal with an additive representation of the spatial grid operator of the problem A = A1 + A2 . Then for the approximation with respect to time of the heat conductivity equation we can use an alternating direction type scheme: 1

1 1 un+ 2 − un + A1 un+ 2 + A2 un = f n+ 2 , τ /2 1

1 1 un+1 − un+ 2 + A1 un+ 2 + A2 un+1 = f n+ 2 . τ /2

Another multi-purpose approach is based on using three-layer schemes. When fulfilling the spectral equivalence α(Aτ u, u) ≤ (Bτ u, u) ≤ β(Aτ u, u)

∀u ∈ H

consider the following family of schemes un+1 − 2un + un−1 τ un+1 − un−1 = f n, (Bτ − ωI) + Aun + τ2 2ω 2τ α , and the error O(τ ) – whose stability is provided by the inequality ω ≤ 1+δ by independence of numbers α and β of the dimensions of H and of the step τ . It should be emphasized that the operator Bτ is not to approximate the identity operator as this takes place in the class of two-layer schemes. Thus all the methods of constructing efficient preconditioners, exploited in the iterative solution of grid elliptic problems, can be extended to non-stationary equations. Advantages of the explicit methods from the parallelization point of view are well known. But very stiff stability conditions were the reason to exclude

282

Yu.M. Laevsky

such algorithms from computational practice, especially for the diffusion problems. On the other hand, there are many examples, where the use of implicit schemes has failed. For instance, implicit methods for the flame propagation problems require the discretization time step, which would coincide with a step of the stable explicit scheme. The reason of this effect is in local instability in the small subdomain with combustion process. It means that the large time step in implicit scheme does not provide an acceptable accuracy. Thus, we consider a new class of explicit schemes with different time steps in the spatial subdomains with local stability conditions. Below we will present two methods in vector–matrix form. These algorithms can be considered as domain decomposition methods. From a variety of the existing methods of parallelization of algorithms, the algorithms considered here are parallelized by the domain decomposition method and by the explicit form of schemes. Briefly, the essence of this method consists in the following. The basic data of a problem are distributed among nodes (branches of a parallel algorithm), and the algorithm is the same in all the nodes, but operations of this algorithm are distributed according to the data, available in these nodes. The distribution of operations of an algorithm consists, for example, in assignment of different values by a variable of the same cycle in different branches, or in performance in different branches of a different number of loops of the same cycle, etc. The homogeneous distribution of data among nodes (branches) serves a basis for the balance between the time needed for calculation, and the time needed for interactions of branches. Let all unknown variables be divided into two groups. Then the matrix, corresponding to the diffusion grid operator, is presented in a block form ⎫ ⎧ A11 A12 ⎪ ⎪ ⎪ ⎪ A=⎩ T ⎭. A12 A22 A two-level scheme of the Dirichlet type is the following: un+1 − un1 1 + A11 un1 + A12 un2 = f1n , Δt k n+ k − un1 ), u1 m = un1 + (un+1 m 1 n+ k+1 m

u2

where Δt = mτ,

k n+ m

− u2 τ

k n+ m

+ AT12 u1

n = 0, 1, . . . ,

k n+ m

+ A22 u2

k n+ m

= f2

,

k = 0, . . . , m − 1. Here the vector un1 corn+ k−1

responds to variables in the “external subdomain”, u2 m corresponds to the variables in the “internal subdomain” and equalities (2) are the linear interpolation at the interface. This means that in the “internal subdomain” we solve the Dirichlet problem. The scale difference is provided by a strong inequality A11  A22 .

Some features of modern computational mathematics

283

The main theoretical result is localization of stability conditions: Δt A11 = O(1),

τ A22 = O(1).

The accuracy of this scheme is O(Δt) in the “external subdomain” and O(τ ) in the “internal subdomain”. Now we will present Dirichlet–Neumann type algorithm with another adjoint condition on the interface. Namely, we use the adjoint condition based on the penalty method. This algorithm approximates a certain auxiliary problem with a discontinuous solution, and convergence to the solution of the original problem is provided by a small positive parameter ε. For simplicity we a present two–level variant of the method. First let us divide all variables into three groups: the variables in the open “external subdomain”, the variables at the interface and the variables in the open “internal subdomain”. Then the diffusion grid operator has the form ⎫ ⎧ A10 0 ⎪ A11 ⎪ ⎪ ⎪ ⎪ ⎪ T (1) (2) . A = ⎪ ⎪ ⎪ A10 A00 + A00 A02 ⎪ ⎪ ⎪ ⎭ ⎩ 0 AT02 A22 Let

⎧ ⎫ A11 A10 ⎪ ⎪ ⎪ ⎪, A1 = ⎪ ⎭ ⎩AT A(1) ⎪ 10 Γ,Γ

⎧ ⎫ (2) ⎪ ⎪ A A ⎪ ⎪ 02 ⎪ A2 = ⎪ ⎭, ⎩ Γ,Γ AT02 A2,2

According to these notations we consider the two groups of the variables. Let us note that we include the variables on the interface into both groups. Then the explicit scheme of the penalty method has the form 1 − un1 un+1 1 + A1 un1 + (B11 un1 − B12 un2 ) = f1n , ε τ k k n+1 n+ m n = u1 + (u1 − un1 ), u1 m k n+ m n+ k+1  m k 1 − u2 u2 n+ k n+ k n+ k T n+ m = f2 m , u1 B22 u2 m − B12 + A2 u 2 m + ε τ where the operator ⎫ ⎧ ⎪ B11 −B12 ⎪ ⎪ B=⎪ ⎭ ⎩ T −B12 B22 corresponds to the internal Newton type boundary conditions in the differential problem: ε

∂u1 + u1 − u2 = 0, ∂ν1

∂u2 ∂u1 =0 + ∂ν2 ∂ν1

on

Γ.

The theory of convergence with respect to the small parameter ε is based on the estimate u − uε 0 = O(ε), where u and uε are the solutions of the original and the perturbed problems, respectively.

284

Yu.M. Laevsky

8 Conclusions and outlook 1. Intensive studies based on FEM for the search for finite-dimensional spaces most adequately corresponding to the original differential problem, are being continued. Particular attention is being given to the description of vector fields and the use of non-structure grids. 2. There is, on the whole, a sufficiently large set of algorithms for the solution of algebraic linear systems, and the major task is the development of qualitative programs with allowance for parallelization, effective stop criteria as well as arranging a wide-scale benchmark of large-size realistic problems. 3. Development of the new efficient possibilities of solving algebraic linear systems to be obtained as a result of approximation of large-size problems is primarily connected with designing the new efficient preconditioners taking into account all the parameters of the differential problem affecting the condition number of a linear system. In this case, the emphasis should be given to the following methods: domain decomposition method; fictitious space method; multigrid and multi-level algorithms. 4. When simulating complicated non-stationary processes one should follow the new possibilities of explicit schemes and use various implicit-explicit methods.

References 1. Brezzi F, Fortin M (1991) Mixed and Hybrid Finite Element Methods. SpringerVerlag, New York 2. Golub GH, van der Vorst HA (1996) Closer to the Solution: Iterative Linear Solvers. Preprint No 982, Universiteit Utrecht 3. Chan TF, Mathew TP (1994) Acta Numerica 3:61–143 4. Laevsky YuM, Matsokin AM (1999) Siberian J Numer Math 4:361–372

Efficient flow simulation on high performance computers T. Zeiser1 and F. Durst2 1

2

Regional Computing Center Erlangen, University of Erlangen-Nuremberg, Martensstraße 1, 91058 Erlangen, Germany [email protected] Institute of Fluid Mechanics, University of Erlangen-Nuremberg, Cauerstraße 4, 91058 Erlangen, Germany [email protected]

Summary. In the last decades, tremendous progress has been made in the area of numerical methods and computer technology. This article gives an introduction to the recent lattice Boltzmann method for simulating the flow of incompressible fluids and shows its application to study the flow in the complex geometry of a randomly packed fixed bed reactor. In addition, general aspects of high performance computing are addressed, e.g. the efficient handling of large amounts of data produced during time-dependent simulations, the performance of recent commodity off-the-shelf (COTS) high performance computers and optimization strategies for them. Finally, the concept of the Federal State of Bavaria for the promotion of high performance techniques is summarized.

1 Introduction Since more than two decades, the performance of computers is approximately increasing by one order of magnitude every five years. At the same time, the numerical algorithms and physical models have steadily been improved with respect to efficiency, applicability and validity. The acceleration obtained through improved or new numerical methods during the last 20 to 30 years (e.g. iterative methods for solving linear systems of equations, multi-grid methods and adaptive algorithms) therefore is in the same order of magnitude as the performance of the hardware increased. This means that by combining the latest numerical developments with the latest computer technology gives a boost of two orders of magnitude within five years. Thus nowadays the computational performance and the numerical methods are in principle available to allow numerical simulations of outstanding technical and scientific relevance. However, to successfully tackle such questions, not only access to the hardware and numerical methods, but also several other aspects are of importance. First of all, the potential users have to know of the appropriate numerical

286

T. Zeiser and F. Durst

methods and the available performance of the computers. Next, the numerical methods have to be implemented efficiently, taking into account the special requirements of the hardware which shall be used later on. Luckily, several standards have successfully been established (e.g. MPI or OpenMP for parallelization) to make life easier, however, it is still worth to know as many details of the hardware (CPU, memory, network interconnect) as possible to obtain high performance. However, owing to the short live cycles of high performance computers it might be necessary to change the complete structure of the simulation program after only a few years again. In the CFD area, a certain number of commercial flow solvers is available on the market. These packages are feature rich and adapted to a large number of applications. However, their target platforms are often only desktop systems or small workstation clusters. Therefore, they are often only of limited used for outstanding grand challenge applications. The computers available nowadays allow the computation of problems with a huge number of degrees of freedom within acceptable times. However, handling all the data in the pre- and post-processing can become a challenge, in particular the huge amount of data produced in the case of time-dependent simulations. Although much effort has been spent on pre-processing tools during the last decades, especially in the context of commercial CFD solvers, the pre-processing remains one of the most important and difficult as well as expensive and time-consuming steps of a CFD simulation. For the pre-processing as well as the analysis of the results, still a good knowledge of fluid mechanics and a good impression of the flow is necessary. Among the many new developments in the CFD area during the last 15 years, the lattice Boltzmann method uses a completely different starting point compared to established CFD solvers. In addition, the method has several features (e.g. simplicity and efficiency while delivering similar accuracy as traditional methods) which make it very interesting for many applications, in particular flows in complex geometries or turbulent flows, although a number of important theoretical and methodological questions still have to be addressed. Some of the aspects mentioned above, will be addressed in more detail in the remainder of the article which is structured as follows. In Sec. 2 the basics of the lattice Boltzmann method as well as some recent improvements of the method are summarized. As mentioned, the pre-processing is still one the most time consuming tasks of a CFD simulation. Therefore, several innovative ways of getting complex geometrical structures into the computer are briefly reviewed at the beginning of Sec. 3, with a special focus on the marker-andcell voxel approach usually used with the lattice Boltzmann method. In the following, a focus on the simulation of transport processes in packed beds, as an example for the flow simulation in a complex geometry, will take place. First, the generation of the packing itself will be described (Sec. 3.1) before results of the actual flow simulation (Sec. 3.2) and a possibility of reducing the amount of data which has to be stored to analyze and visualize time de-

Efficient flow simulation on high performance computers

287

pendent flows (Sec. 3.3) are discussed. Then, again more general questions are addressed. Some comments on recent low cost, commodity off-the-shelf (COTS) high performance computers and starting points for optimization strategies of lattice Boltzmann flow solvers for such cache based systems are made in Sec. 4. The public availability of fast and large computer systems as well as recent numerical methods alone do not suffice to make scientists and engineers use them efficiently and on a regular basis. The last section (Sec. 5) therefore gives a brief overview on the Competence Network for Technical and Scientific High Performance Computing (KONWIHR) which has been established by the Federal State of Bavaria to enlarge the deployment of these modern techniques in science and engineering.

2 Lattice Boltzmann Approach 2.1 Basic Lattice Boltzmann Model and Theoretical Background Historically, the lattice Boltzmann method [1] emerged from the lattice gas cellular automata. The first experiments of flow simulations using cellular automata reach back to the late 1970s [2]. These first simulations however did not fully recover Navier Stokes like behavior of the flow. It took ten years until Frisch and his coworkers realized the importance of the lattice symmetry. Their FHP models [3, 4] were the first cellular automata like approach which were able to simulate Navier Stokes like flow behavior. These FHP models historically have been the basis for all further developments of lattice gas and lattice Boltzmann methods until again, about ten years later, a rigorous derivation of the lattice Boltzmann equation from the Boltzmann equation has been established [5, 6]. Cellular automata [7] follow a completely different philosophy [8] than all methods which are based on macroscopic balance equations (e.g. partial differential equations, PDEs). To numerically solve PDEs, these equations have to be discretized, resulting in a linear system of equations. Cellular automata, on the other hand, consist of a large number of individual cells. The state of all cells is updated according to some simple and uniform rules which depend on the status of the local neighborhood only. All cells are treated by the same set of rules and the complex behavior of the system emerges with time from the interactions of the cells. The implementation of cellular automata on a computer is straight forward as they are based on regular cells with a well defined local neighborhood and discrete values (states). However, the biggest difficulty of the cellular automata approach is to find suitable rules on the microscopic level which reproduce the desired macroscopic behavior [9]. Looking at the mathematical and physical background of the lattice Boltzmann method, it is a kinetic-based approach for fluid flow computations. While usual methods deal directly with the macroscopic variables of interest,

288

T. Zeiser and F. Durst

e.g. the velocity u and the pressure p, by solving the Navier Stokes equations, in the lattice Boltzmann approach, a kinetic equation for the particle velocity distribution function ! f (x, ξ, t) is solved. This distribution function is defined in such a way that f (x, ξ, t)dx dξ is the number of particles which at time t are located within a phase-space control element dx dξ about x and ξ where x and ξ are the spatial position vector and the particle velocity vector, respectively. The macroscopic quantities, such as the (mass) density ρ and the momentum (density) ρu can then be obtained by evaluating the first moments of the distribution function f . Neglecting external forces, the transport equation for f (x, ξ, t) can be expressed by the Boltzmann equation as ∂f ∂f = Q(f, f ) , +ξ ∂x ∂t

(1)

where the collision term Q(f, f ) is quadratic in f , consisting of a complex integro-differential expression. A suitable simplification of the collision integral is the single-relaxation-time (SRT) approximation, the so-called BhatnagarGross-Krook (BGK) model [10], 1 Q(f, f ) = − (f − f (0) ) , λ

(2)

where f (0) is the Maxwell-Boltzmann equilibrium distribution function, and λ is the relaxation time which controls the rate of approaching equilibrium, or in other words the viscosity of the fluid. Both collision terms fulfill Boltzmann’s H-theorem and locally conserve mass and momentum. To solve for f numerically, Eq. 1 is first discretized physically in the velocity space using a finite set of velocity vectors ξi (i = 0, . . . N ) leading to the velocity discrete Boltzmann equation, 1 ∂fi ∂fi = − (fi − fieq ), + ξi λ ∂xi ∂t

i = 0, . . . N ,

(3)

where fi (x, t) ≡ f (x, ξ i , t). For simulating two-dimensional flows, the 9velocity D2Q9 model (i = 0, . . . 8), and for three-dimensional simulations, both, the 15-velocity D3Q15 (i = 0, . . . 14) and the 19-velocity D3Q19 model (i = 0, . . . 18) are widely used. Such a low number of collocation points is sufficient to describe the fluid in the near-equilibrium state of low Mach number hydrodynamics. For all these models, the equilibrium distribution function fieq is of the form   3 9 3 2 (4) fieq = ρwi 1 + 2 ei · u + 4 (ei · u) − 2 u · u , 2c 2c c with c = δx/δt and the discrete particle velocities ei . The weighting factors wi depend only on the lattice model [11]. This discrete equilibrium distribution

Efficient flow simulation on high performance computers

289

function f eq has been derived from the Maxwell-Boltzmann equilibrium distribution function f (0) in such a way that the velocity moments up to fourth order are identical with those of f (0) . The (macroscopic) values of density ρ and momentum ρu can be evaluated as  ∞ N N   f dξ = fi = fieq , (5) ρ= ∞



i=0



ρu = ∞

ξf dξ =

N 

i=0

ei fi =

i=0

N 

ei fieq .

(6)

i=0

√ The speed of sound in these models is cs = c/ 3 and the pressure is given by the equation of state of an ideal gas, p = ρc2s .

(7)

To obtain the main equation of the lattice Boltzmann approach, Eq. 3 is discretized numerically in a very special manner. The discretization of space and time is accomplished by an explicit finite difference approximation. By scaling the lattice spacing, the time step and the discrete velocities according to δx = ei,α /δt, the discretized equations take the following explicit form: 1 fi (x∗ + ei δt , t∗ + δt) − fi (x∗ , t∗ ) = − [fi (x∗ , t∗ ) − fieq (x∗ , t∗ )] , (8) τ where τ = λ/δt is the dimensionless relaxation time and x∗ is a point in the discretized physical space. The right hand side of Eq. 8 is usually called collision step and the left hand side streaming step. For the collision step, the equilibrium distribution function has to be calculated at each cell and at each time step from the local density ρ (Eq. 5) and the local macroscopic flow velocity u (Eq. 6). The Navier Stokes equations (up to second order in space and time) can be derived formally from the lattice Boltzmann equation through the ChapmanEnskog expansion by a standard multi-scale expansion with time and space rescaled and the distribution function fi expanded up to second order [4, 5, 12]. The relation between the relaxation time τ and the shear viscosity ν, including a correction for the truncation error due to the discretization, can be obtained from the result of this Chapman-Enskog expansion. As the discretization error is known a priori from this analysis, it can be corrected and thus, the lattice Boltzmann method does not suffer from numerical diffusion as many other finite difference methods do. The final relation for the kinematic viscosity is ν = (τ − 1/2) c2s δt.

(9)

As a computational tool, the lattice Boltzmann method differs from methods which are directly based on the Navier Stoke equations in various aspects. The major differences are according to Yu et al. [13] as follows:

290

T. Zeiser and F. Durst

1. The Navier Stokes equations are second-order partial differential equations (PDEs); the discrete velocity Boltzmann equation from which the lattice Boltzmann model is derived, consists of a set of first order PDEs. 2. Navier Stokes solvers inevitably need to treat the nonlinear convective term, u · ∇u; the lattice Boltzmann method totally avoids the nonlinear convective term, because the convection becomes a simple advection (uniform data shift). 3. CFD solvers for the incompressible Navier Stokes equations need to solve the Poisson equation for the pressure. This involves global data communication, while in the lattice Boltzmann method data communication is always local and the pressure is obtained through an equation of state. 4. In the lattice Boltzmann method, the Courant-Friedrichs-Lewy (CFL) number is proportional to δt/δx, in other words, the grid CFL number is equal to unity based on the lattice units of δx = δt = 1. Consequently, the time dependent lattice Boltzmann method is inefficient for solving steady-state problems, because its speed of convergence is dictated by acoustic propagation, which is very slow.3 5. Boundary conditions involving complicated geometries require a careful treatment in both Navier Stokes and lattice Boltzmann solvers. In Navier Stokes solvers, normal and shear stress components require appropriate handling of geometric estimates of normals and tangents, as well as one-sided extrapolations. In lattice Boltzmann solvers, the boundary condition issue arises because the continuum framework, such as the no-slip condition at the wall, does not have a direct counterpart. 6. Since the Boltzmann equation is kinetic-based, the physics associated with molecular level interactions can be incorporated more easily in the lattice Boltzmann model. Hence, the lattice Boltzmann model might be fruitfully applied to micro-scale fluid flow problems. 7. The spatial discretization in the lattice Boltzmann method is dictated by the discretization of the particle velocity space. This coupling between discretized velocity space and configuration space leads to regular square grids. This is a limitation of the lattice Boltzmann methods, especially for aerodynamic applications where both the far field boundary condition and the near wall boundary layer need to be carefully implemented.

3 However, especially in the case of complex geometries, the lattice Boltzmann methods can still be competitive or even faster than Navier Stokes solvers, see e.g. [14, 15].

Efficient flow simulation on high performance computers

291

Because of the attractive features mentioned above and despite the fundamental or current limitations, the lattice Boltzmann method has been particularly successful in simulations of fluid flows involving complicated boundaries and/or complex fluids, such as turbulent external flows over complicated geometries, multi-component fluids in porous media, multi-phase flows, and many other areas (see e.g. [1, 16, 17, 18, 19, 20]). 2.2 Solid Wall Boundary Conditions The lattice Boltzmann equation is usually solved for all fluid nodes on an equidistant Cartesian mesh. Arbitrary complex geometries can be represented on this grid with the help of the marker-and-cell approach by simply changing the state of single cells (voxels) from fluid (free) to solid (occupied). Wall boundary conditions can thus be implemented easily within the lattice Boltzmann framework by the so-called bounce-back rule [21] which basically means that particle distributions which would enter a solid node during the streaming step are simply set back to the original cell but with opposite momentum (i.e. fi (x∗ , t∗ + δt) = fi (x∗ , t∗ ) + Q with ei = −ei ). This results in a no-slip boundary condition, satisfying second order accuracy (with respect to a stair steps geometry). The wall is located half-way between the two nodes [22] and allows an easy and efficient handling [14, 23] of arbitrary complex geometries. For low and moderate Reynolds numbers, the staircase approximation of the geometry does not have a significant influence on the hydrodynamical results even on relatively coarse grids. However, for high Re flows, a geometrically smooth surface is usually necessary. Other types of wall boundary conditions which are not limited to the staircase approximation of the geometry are known in literature, e.g. boundary fitting [24] or boundary interpolation [25]. A simplified geometric second order boundary condition has recently be suggested by Yu et al. [13]. Compared to the bounce back rule, all these methods are more complicated, computationally more expensive and they involve either the extrapolation or interpolation of distribution values which can significantly reduce the stability of the method. Furthermore, it is no longer automatically guaranteed that mass is conserved. 2.3 Lattice Boltzmann Model with Improved Incompressibility In the standard lattice Boltzmann model as described above, density and pressure are directly coupled by the equation of state (Eq. 7), i.e. a pressure drop automatically results in a density decrease. In order to ensure a constant mass flux, the velocity must therefore increase equivalently, leading to an unexpected behavior for incompressible fluids and a non-divergence-free velocity field.

292

T. Zeiser and F. Durst

To improve the incompressibility, He & Luo [26] suggested to split the pressure p into a constant part p0 and a slightly changing perturbation δp . Now, a distribution function Pi for the pressure can be defined. Pi (x∗ + ei δt, t∗ + δt) = Pi (x∗ , t∗ ) −

1 [Pi (x∗ , t) − Pieq (x∗ , t)] , τ

(10)

i = 0, . . . , 18, with the local equilibrium distribution function Pieq $  3 2 9 3 eq 2 . Pi = wi p + p0 2 ei · u + 4 (ei · u) − 2 u 2c 2c c

(11)

The resulting quadrature formulae for calculating the macroscopic quantities are pressure

p=

18 

Pi

(12)

ei Pi /p0 .

(13)

i=0

flow velocity

u=

18  i=0

This set of equations still recovers the time-dependent incompressible Navier Stokes equations in the low Mach number limit in the same way as the original lattice Boltzmann equations do, but with improved incompressibility. The treatment of the wall boundary conditions can be kept without changes. Choosing p0 = 1 simplifies the calculation of the local velocity by eliminating the division. 2.4 Multi-Relaxation-Time (MRT) Lattice Boltzmann Schemes The popular single-relaxation-time (SRT) approximation (BGK model, Eq. 2) described in Sec. 2 represents one of the simplest expressions for the collision operator of the Boltzmann equation. However, the BGK model has some shortcomings due to its simplicity. In particular, as the bulk and shear viscosity have to be identical due to the SRT approximation, it is difficult to damp out the acoustic modes in the transient pressure field [13, 27]. Large pressure oscillations occur in particular near the numerical stability limit, i.e. with increasing Reynolds number, in the vicinity of singularities or due to severe errors in the initial condition. These pressure oscillations can, in the end, cause tremendous stability problems or produce at least noisy solutions [20]. Multi-relaxation-time (MRT) models, also called generalized lattice Boltzmann approach, can help to increase the stability and thus the computational efficiency significantly.

Efficient flow simulation on high performance computers

293

The basic idea of the MRT models is to define suitable moments of the particle velocity distribution function fi with regard to the discrete particle velocities ei . At each time step, the particle velocity distribution function fi is mapped from the discrete velocity space to the moment space. The relaxation is then done on these moments with suitable equilibrium values and relaxation time constants for each of them. Before streaming, the values are transfered back to the velocity space and the propagation and handling of boundary conditions can be done as in a SRT model. For more detail on the MRT models, including all necessary equations, the interested reader is referred to [13, 20, 27, 28, 29].

3 Synthetical Generation of Random Sphere Packings and Simulation of Transport Processes therein When speaking of CFD simulations, the first step is the pre-processing with the mesh generation. This typically means a time consuming process independently whether CAD data of the geometry of interest is available or not. If the geometry is available as CAD data from the design process, the data often cannot directly be used for the CFD simulation because it might contain too many details, or more over because the designers do not care about perfectly closed surfaces which are a fundamental requirement for the mesh generation. Through the integration of the design and the simulation process, there might be some improvements in the future. If a marker-and-cell (or voxel) based CFD approach is used, for example on an equidistant Cartesian mesh, the automatic grid generation is facilitated by digital mockup techniques and even advanced traditional mesh generation tools (e.g. pro*am from the CD adapco Group) may be used to obtain the desired voxel representation of the geometry. In addition to this classical way of obtaining the geometrical representation, also several innovative methods are applicable in certain situations. If the model is physically available, computed tomography can be used to digitize the object without destroying it. From the bitmap pictures of equidistant slices, the 3D structure can be reconstructed [17]. A similar non-invasive approach is possible with nuclear magnetic resonance imaging (MRI). In contrast to computer tomography, this method can not only provide the geometrical structure but also information on transport processes (e.g. velocity field or diffusion processes) which could be used for validating the numerical methods. In the following subsection, the possibility of synthetically generating the geometrical structure for the CFD simulation by an independent simulation process is described. Hereafter, results of lattice Boltzmann simulations of the flow through such geometrically complex random packings of spheres together with the possibility of a concurrent visualization of transient data (to reduce the amount of data to be stored at each time step) are discussed.

294

T. Zeiser and F. Durst

3.1 Synthetic Generation of Random Packings To systematically analyse transport phenomena in randomly packed beds of spheres, a fast and efficient way of generating and discretizing such packings is required. Studies of the structure of a randomly packed system have been an attractive topic in both physical and computational experiments since long time [30, 31, 32]. For systematic investigations, a synthetical generation of such packings by computer simulations is desirable as all experimental methods are more time consuming and expensive. With the subsequent CFD simulation in mind, we adapted the Monte Carlo process suggested by Soppe [32]. We selected this approach for synthetically generating random packings of spheres because of its simplicity and compared to other methods of generating synthetically packings (see e.g. [33]), the process of Soppe is able to produce packings with lower global porosities, in close agreement with experimental findings. Our initial implementation [34] worked on a 3-D Cartesian grid with proper resolution for the subsequent lattice Boltzmann simulation. In the meantime, the discrete Cartesian grid has been abandoned and the spheres are allowed to be located anywhere (as long as they do not overlap). In this way, the geometrical discretization is decoupled from the actual packing process. This allows convergence studies in the CFD simulation or, in the future, the adaption to geometrical second order curved boundary conditions in the lattice Boltzmann simulation. The Monte Carlo packing algorithm consists of two consecutive steps. In the first step of this algorithm, spherical particles of uniform size or a given size distribution are successively dropped into a cylindrical confining tube. They stop as soon as they come into contact with any other sphere. During this raining process, a loose (and non-physical) packing is generated. In the second step, the packing is compressed (compressing step). Single spheres are randomly chosen and moved a small distance if possible without creating an overlap. Movements into the direction of gravity have an increased probability. These compression steps are repeated several thousand times. The process stops if no reduction of the potential energy of the spheres occurs any longer and all spheres are in a mechanically stable position. Due to the random character of the raining and compressing process, a reinitialization of the random number generator (e.g. restarting the simulation) results in a slightly different packing, i.e. the relative positions of the spheres are slightly different. This is an important feature of the packing simulation as it allows for the investigation of the influence of the local structure on the local and global behavior of the randomly packed fixed-bed reactor. Figure 1 shows, as an example, synthetically generated random packings with an aspect ratio of 5.6. The comparison of the (axially and circumferentially averaged) radial porosity profile extracted from simulated packings with an aspect ratio of 5.6 with experimental data of Benenati & Brosilow [30] for the same diameter aspect ratio, as well as the comparison of simulation results for an aspect ra-

Efficient flow simulation on high performance computers

295

Fig. 1. Visualization of three slightly different synthetically generated packings with an aspect ratio of 5.6. The unit box indicates the tube diameter

tio of 5.96 and a length of at about 100 sphere layers with experimental data of Mueller [35] in Fig. 2 demonstrate as an example that the simulation can reflect the local features of experimental findings at a quantitative level. The resulting porosity profiles are strongly oscillating, starting with a porosity of unity near the wall and reaching a first local minimum at a wall distance of about 0.5 particle diameters. At the first local maximum at a wall distance of about 1, a discontinuity is clearly observed which has been also found in several other recent studies [33, 35]. The globally observed oscillating porosity profile but also the strong local structural inhomogeneities have a strong influence on the local and the overall performance of such randomly packed reaction, separation or purification units. Especially in the case of small aspect ratios, the calculation of global averages and the application of volume averaging methods seems to be questionable. Therefore, only detailed 3D simulations can provide information with sufficient accuracy in that case.

296

T. Zeiser and F. Durst

Fig. 2. Comparison of the radial porosity profile extracted from the Monte Carlo packing simulations with experimental data. For the simulated values, the mean values as well as the standard deviation of ten samples are shown

3.2 Results of Flow Simulations in Random Packings of Spheres To obtain the flow field in the complex 3D porous structure of random packings, our lattice Boltzmann solver BEST which is based on the 19-velocity D3Q19 SRT lattice Boltzmann model and the bounce back boundary condition [36], was used. The local flow distribution can preferably be validated and discussed using radial velocity profiles. In the past, numerous approaches have been undertaken to extract information of velocity distributions inside the packing from measurements carried out a short distance behind the packing, as the inner domain is not accessible when using traditional measuring methods such as hot wire anemometry [37]. However, the extrapolation of these values to the inside of the packing seems to be very questionable. Nowadays, measuring techniques like Laser-Doppler-anemometry (LDA) or nuclear magnetic resonance (NMR) methods allow under certain conditions the measurement of local velocities inside the packing. We compared [38, 39] our results of the CFD simulation with data from Krischke [40], who performed LDA measurements inside packings of spherical glass particles. The simulations were carried out for the corresponding aspect ratios of 4 (L/dP = 20,  = 0.471, approximately 250 spheres in total) and 6.15 (L/dP = 21,  = 0.439, approximately 650 spheres in total) respectively. Each sphere was discretized with a resolution of 30 voxels per diameter. As the complete 3D structure of the packings used in the experiments was not available, we synthetically generated packings as similar as possible. In Fig. 3 and 4 experimental data of two vertical measuring planes (i.e. axially averaged, but at certain angular positions) are shown (LDA plane 1 and 2) and compared with two vertical planes extracted from our simulation results (LB-Sim plane 1’ and 2’). Additionally the axially and circumferentially averaged radial velocity profile is given (LB-Sim, average). As experimental data

Efficient flow simulation on high performance computers

297

is only available up to a dimensionless wall distance of 1.5 for the larger aspect ratio, the abscissa in Fig. 4 is cut at this value. In general, a good agreement of the simulated and experimental data can be observed, in particular close to the wall where the highest ordering effect can be observed. The profiles show the typical oscillating characteristics as expected from the radial porosity profile. Starting from zero velocity at the wall, the first maximum of about 2.5 times the superficial velocity is reached in the near wall region (where the porosity reaches values near unity). The position of the first minimum at about 0.5 particle diameters from the wall again corresponds exactly to the extreme value of the porosity profile. Especially in this near wall region, the simulation results partly match the experimental data quantitatively (e.g. LB-Sim plane 1’ and LDA plane 1 and 2 in Fig. 3, LB-Sim plane 1’ and LDA plane 2 in Fig. 4). In the inner region, deviations exist for both aspect ratios, probably due to the fact that the geometrical structures used in the simulation and the experiment were not identical.

Fig. 3. Comparison of simulated and measured velocity profiles in different vertical planes, aspect ratio D/d = 4 and Red ≈ 50

Fig. 4. Comparison of simulated and measured velocity profiles in different vertical planes, aspect ratio D/d = 6.15 and Red ≈ 50

Also the pressure drop in such random packings can be derived from the simulations and analyzed in detail. However, this topic is discussed at length elsewhere [38, 41, 42, 43]. 3.3 Concurrent Visualization with RVSlib An efficient approach for visualizing the enormous amount of data being produced at each time step of a transient simulation can be achieved with a server sided data reduction carried out in batch mode or interactively with a steering client. Instead of storing the whole dataset, only the relevant infor-

298

T. Zeiser and F. Durst

mation is extracted during the on-going simulation, thus bypassing the major bottleneck of bandwidth to the hard-disk. The real-time visual simulation library RVSlib from NEC [44] is a system for visualizing computational results concurrently with the on-going simulation for a broad range of scientific computing applications such as computational fluid dynamics and structural analysis. The basic concept (Fig. 5) is a real-time data reduction to the required image (according to pre-defined scenarios or interactively) instead of writing the whole dataset to the hard-disk and carrying out an a-posteriori visualization. RVSlib is a client-server type system consisting of RVSlib/Server and RVSlib/Client. These two components can be started either on the same machine or on different machines. The RVSlib/Server supports a set of FORTRAN subroutines to be incorporated into user applications. It enables the user to display an image created by the RVSlib/Server and to change simulation parameters through a GUI (Graphical User Interface). Alternatively, the generation of movies in batch mode is possible by using a script steering the camera position etc. RVSlib visualization routines have been integrated into our existing lattice Boltzmann solver BEST by adding a few subroutine calls. The additional CPU-time required for the image generation and data compression during the simulation is, depending on what has to be visualized, usually only a few percent of the total CPU-time. Within the lattice Boltzmann algorithm, all local flow quantities (velocity, pressure, species concentration) are calculated anyway from the density distributions during the simulation procedure (once per iteration). For the visualization, therefore, only additional arrays have to be defined to store and later provide the quantities to the RVSlib routine calls at the end of the main loop. So, the additional overhead is restricted to the relatively small amount

1HWZRUN /$1:$1 +3&VHUYHU

FRQWURO FRPSUHVVHGLPDJHV

8VHUµV&)'FRGH &$//596B,1,7

&)'LWHUDWLRQ

&)' FRPSXWDWLRQ &$//596B%)& &$//596B0$,1 &$//596B7(50

596LQLWLDOL]DWLRQ &OLHQWDFWLYDWLRQ &OLHQWFRPPXQLFDWLRQ *UDSKLFVSURFHVVLQJ 0DSSLQJ 5HQGHULQJ ,PDJHFRPSUHVVLRQ &OLHQWWHUPLQDWLRQ 596WHUPLQDWLRQ

596B86(5B2%-(&7 596B86(5B75$&(5

(a)

596/,%VHUYHU

&OLHQW:6

6HUYHUFRPPXQLFDWLRQ ,PDJHH[SDQVLRQ *8, 9LVXDOL]DWLRQEDVHGLQSXW /RFDOUHQGHULQJ

596/,%FOLHQW LPDJH GLVSOD\

*UDSKLFV ZLQGRZ

&57

$FWLRQ ZLQGRZ &RQWURO SDQHO

(b)

Fig. 5. a) RVSlib client/server concept; b) implementation scheme [44]

Efficient flow simulation on high performance computers

299

of memory which is necessary to store these arrays and the CPU-time due to the image rendering. Figure 6 shows the evolution of the velocity distribution in a packing with aspect ratio 5.6 during the start-up period. Information about these changes can help to better understand the influence of changing the mass flux or switching the feed on and off. Without a concurrent visualization, a lot of data would have to be written to disk for each time-step instead of just small images.

Fig. 6. Velocity distribution in a packing with aspect ratio 5.6 during the startup period (t∗ =10, 50, 200, 1000) visualized concurrently with the simulation using RVSlib

4 Experience with “Commodity Off-The-Shelf ” (COTS) HPC Hardware In recent years, the performance of commodity off-the-shelf (COTS) hardware, particularly PCs, dramatically increased and many PC clusters have been

300

T. Zeiser and F. Durst

installed worldwide. PC clusters, or RISC-based systems in general, are much less expensive than traditional high performance computers such as vectorparallel computers (e.g. NEC’s SX series, Fujitsu’s VPP series or Cray’s T90 and X1) or specially designed massively parallel computers (e.g. the IBM SP2 or Cray T3E). With regard to peak performance or even linpack performance, the commodity PC cluster systems almost caught up with or even outperform traditional special purpose high performance computer systems as can clearly be seen from recent Top 500 lists at http://www.top500.org/. However, when considering complex CFD simulations, the peak or linpack performance does not tell much. Common commodity CPUs usually have much less internal registers and the bandwidth of such systems to the main memory is much lower than for example in the case of vector parallel computers. Also the available compilers and tools, as well as their quality, can significantly influence software development and optimization cycles — in the end, the sustainable performance. With today’s computers, the limits for realizable large scale computations are given by the affordable compute time – not by the available main memory. With regard to memory, the limit is not the total amount of memory available but the data transfer bandwidth. These changes in technology must also be reflected in the implementation of software packages. To detect bottle necks, advanced debugging and performance measurement tools are required. For traditional high performance computer systems, such tools with high quality are available from the system manufactures. However, for commodity clusters, such tools are still scarce or produce questionable results. Looking for example at the common IA32 platform, the GUI of Intel’s VTune4 at the time of writing still requires a host running MS Windows – only the data collector can run on a Linux box. The multi-platform performance application programming interface PAPI5 has problems with measuring SSE2 instructions on current Intel Xeon processors. Up to now, it seems to be unclear if this is a problem of PAPI or a feature of the hardware performance counters of the processor. The lattice Boltzmann algorithm is quite simple, involving only rather few floating point operations per memory access, as shown in Sec. 2 above. The most crucial part of the complete algorithm consists of the collision and streaming step (Eq. 8) which can be merged together. In this way, one big loop results which requires a large number of internal registers to store intermediate results [45]. In addition, the access to the memory cannot be of stride one due to the stencil of the lattice Boltzmann discretization. Therefore, details of the layout of arrays can significantly influence the performance. A simple lattice Boltzmann solver consists of two (full) arrays storing the particle velocity distribution functions of all nodes after relaxation (but before propagation) and the final values after streaming, respectively. For the layout, 4 5

see http://developer.intel.com/ see http://icl.cs.utk.edu/projects/papi/

Efficient flow simulation on high performance computers

301

any permutation of (x, y, z, i, s) might be reasonable, with the spatial coordinates x, y, z, the discrete velocity direction i and the state s which can be post-relaxation or post-streaming. Each arrangement has certain advantages on different platforms, e.g. optimizing the collision process or the streaming step. In advance, it is difficult to tell an optimal layout for a certain hardware or even hardware–compiler combination. In addition, multi-way blocking algorithms can give a boost on some architectures (but this also makes the code much more complex). Preliminary results on this topic can be found in [46, 47, 48]. An other strategy can be to abandon the full arrays and to switch to 1-D lists. In this case, the cells can be sorted by type (e.g. inlet, outlet, fluid or wall boundary solid node). The nodes inside a solid block can easily be eliminated in this way, however, the connectivity of all cells has to be stored and the algorithm now contains vast indirect addressing [49]. Further single CPU performance aspects and optimization strategies are discussed in detail in a forthcoming publication [50],

5 Promotion of the Application of High Performance Computers — The Bavarian Competence Network for Technical and Scientific High Performance Computing (KONWIHR) Many technical problems in natural and engineering science can be described by complex mathematical and physical models which theoretically can be solved with the help of computers. However, outstanding problems such as the flow in complex geometries, the simulation of turbulent flows or the prediction of earthquakes, require outstanding computational resources. The combination of nowadays computational performance with recent numerical algorithms and physical models and their application to the solution of technical and scientific problems is embraced by the so-called scientific high performance computing (HPC). With the installation of the German top-level compute server in Bavaria (HLRB), the Hitachi SR8000-F1 installed at the Leibniz Computing Center Munich in spring 2000 and its upgrade in the beginning of 2002, a major breakthrough was achieved. With the performance of over two trillion compute operations per second (2 TFlop/s), this is still a fast civil installation used for general research. However, the efficient use of modern supercomputers is closely connected with an extensive and qualified user training and user support as well as special promotion of high performance computing techniques. The Competence Network for Technical and Scientific High Performance Computing in Bavaria6 (KONWIHR), a successor of the Bavarian Consortium 6

see http://konwihr.in.tum.de/ and http://www.konwihr.uni-erlangen.de/

302

T. Zeiser and F. Durst

for High Performance Scientific Computing (FORTWIHR), was therefore established in May 2000 with a total budget of about 4 million EUR for four years to enlarge the deployment of this innovative high performance computing technology for scientific and industrial research and development projects. This goal is achieved by promoting scientific co-operations in the field of high performance computing, in particular with relevance to the German Federal top-level compute server in Bavaria (HLRB). Furthermore, KONWIHR provides funding for training courses as well as technical meetings, workshops and conferences that help to transfer the use of this technology in new application areas. Presently about 20 larger scientific projects are funded through KONWIHR. They include among others research in the fields of parallel computing, computational fluid mechanics, turbulence, seismic simulations, scientific visualization, computer chemistry, structural mechanics, computational physics, material sciences, and bio-engineering. These strategic projects mainly focus on the application and integration of numerical software for the solution of questions with direct scientific or industrial relevance. Every project is evaluated by an external advisory board with experts from universities, research institutes and industry on an annual basis to ensure high scientific quality. The offices in Munich and Erlangen are in charge of co-ordinating the activities and are serving as contact points. Beside these long term projects, short evaluation projects and numerous workshops have been successfully initiated. Thus, training and high level teaching is one of the long term activities of KONWIHR which ensures the usage of scientific high performance computing methods also in the industrial research and development of the future. Scientific high performance computing is an interdisciplinary technology including contributions from computer science, mathematics, engineering science, chemistry and physics. This aspect is also reflected in the structure of KONWIHR. The competence network comprises project groups from different faculties of major Bavarian universities and several industrial enterprises. The work groups integrate mathematicians, experts in computer science and engineers as well as natural scientists. The goal of KONWIHR is to enlarge the employment of high performance computing in basic research as well as applied technologies. This goal shall be achieved by directly promoting interdisciplinary co-operation between engineers and scientists from the above faculties. These co-operations are planned and co-ordinated closely among the contribution academic institutions as well as industrial enterprises and with the computing centre of the Bavarian Academy of Science in Munich (LRZ) or the computing center of the university of Erlangen (RRZE). Additional support is granted for the dissemination of knowledge by supporting workshops, conferences or academies. In this concept, the support projects located at the compute centers play an important role as they provide a local and persistent center of excellence for general scientific computing aspects, including HPC support (programming, debugging, optimization of HPC codes), HPC training (HPC lectures and

Efficient flow simulation on high performance computers

303

tutorials for students and users) and HPC information (information about the latest developments in HPC: new architectures, programming models, available resources, global trends, . . . ).

Acknowledgements This work is financially supported by the German Research Foundation (DFG) and the Competence Network for Technical and Scientific High Performance Computing in Bavaria KONWIHR. The initial work on the Monte Carlo Packing Generation Tool (McPackGen) has been started by Yong-Wang Li (ICCCAS, China) during his stay at the University of Erlangen-Nuremberg in the framework of an Alexander von Humboldt (AvH) foundation fellowship. Many improvements and validation studies have been contributed by Martin Steven during his Master’s thesis [34]. The benchmark computations were done in close cooperation with the team of Gerhard Wellein from the Regional Computing Center Erlangen. Benchmark calculations on a NEC SX-6 and a NEC 16-way Itanium2 system were kindly provided by Josef Weigl from NEC HPCE. The authors would also like to acknowledge the support from and helpful discussions with numerous colleagues, especially J¨org Bernsdorf (NEC CCRLE), Hannsj¨ org Freund (TC1) and Florian Huber during his Master’s thesis [39].

References 1. 2. 3. 4. 5. 6. 7. 8. 9.

10. 11. 12. 13. 14.

Chen S, Doolen GD (1998) Annu Rev Fluid Mech 30:329–364 Hardy J, de Pazzis O, Pomeau Y (1976) Phys Rev A 13(5):1949–1961 Frisch U, Hasslacher B, Pomeau Y (1986) Phys Rev Lett 56(14):1505–1508 Frisch U, d’Humi`eres D, Hasslacher B, Lallemand P, Pomeau Y, Rivert J-P (1987) Complex Syst 1:649–707 He X, Luo L-S (1997) Phys Rev E 55(6):6333–6336 He X, Luo L-S (1997) Phys Rev E. 56(6):6811–6817 Wolfram S (1986) J Stat Phys 3/4:471–526 Kadanoff LP (1986) Phys Today 39:7–9 Chopard B, Droz M (1998) Cellular Automata Modeling of Physical Systems. In: Collection Al´ea-Saclay: Monographs and Texts in Statistical Physics. Cambridge University Press, Cambridge, United Kingdom Bhatnagar P, Gross EP, Krook MK (1954) Phys Rev 94(3):511–525 Qian Y.H, d’Humi`eres D, Lallemand P (1992) Europhys Lett 17(6):479–484 Chapman S, Cowling TG (1995) The Mathematical Theory of Non-Uniform Gases. Cambridge University Press, Cambridge, United Kingdom Yu H, Luo L-S, Girimaji SS (2003) Int J Comp Eng Sci 3(1):73–87 Bernsdorf J, Durst F, Sch¨ afer M (1999) Int J Numer Meth Fluids 29:251–264

304

T. Zeiser and F. Durst

15. Bhandari V (2002) Detailed investigations of transport properties in complex reactor components. Master’s thesis, Lehrstuhl f¨ ur Str¨ omungsmechanik, Universit¨ at Erlangen-N¨ urnberg 16. Bernsdorf J, Brenner G, Zeiser T, Lammers P (2001) Perspectives of the lattice Boltzmann method for industrial applications. In: Jenssen C, Kvamdal T, Andersson H, Pettersen B, Ecer A, Periaux J, Satofuka N, Fox P (eds) Parallel Computational Fluid Dynamics 2000, Tends and Applications. Proceedings of the Parallel CFD 2000 Conference, Elsevier 17. Bernsdorf J, G¨ unnewig O, Hamm W, M¨ unker M (1999) GIT LaborFachzeitschrift 4:387–390 18. Brenner G, Zeiser T, Durst F (2002) Chem-Ing-Tech 74(11):1533–1542 19. Brenner G, Zeiser T, Lammers P, Bernsdorf J, Durst F ERCOFTAC bulletin. 2001) 50:29–34 20. Krafczyk M, T¨ olke J, Luo L-S (2003) Int J Mod Phy. B. 17(1&2):33–40 21. Ziegler DP (1993) J Stat Phys 71(5/6):1171–1177 22. Ladd AJC (1994) J Fluid Mech 271:285–309 23. Inamuro T, Yoshino M, Ogino F (1995) Phys Fluids 7(12):2928–2930 24. Filippova O, H¨ anel D (1998) J Comput Phys 147:219–228 25. Bouzidi M, Firdaouss M, Lallemand P (2001) Phys Fluids 13(11):3452–3459 26. He X, Luo L-S (1997) J Stat Phys 88(3/4):927–944 27. Lallemand P, Luo L-S (2000) Phys Rev E 61(6):6546–6562 28. d’Humi`eres D (1992) Generalized lattice-Boltzmann equations. In: Shizgal BD, Weaver D (eds) Rarefied Gas Dynamics: Theory and Simulation, Progress in Astronautics and Aeronautics, Washington AIAA:450–458 29. D’Humi`eres D, Ginzburg I, Krafczyk M, Lallemand P, Luo L-S (2002) Phil Trans R Soc Lond A 360(1792):437–452 30. Benenati R, Brosilow C (1962) AIChE Journal 8(3):359–361 31. Jodrey W, Tory E (1981) Powder Technol 20:111–118 32. Soppe W (1990) Powder Technol 62:189–196 33. Limberg C (2002) Computergenerierte Kugelsch¨ uttungen in zylindrischen Rohren als Basis f¨ ur eine differenzierte Modellierung von Festbettreaktoren. Ph.D. thesis, TU Cottbus 34. Steven M (2001) Detaillierte Simulation und Analyse der Struktur von Katalysatorsch¨ uttungen und der lokalen Transportprozesse. Diplomarbeit, Lehrstuhl f¨ ur Technische Chemie I / Lehrstuhl f¨ ur Str¨ omungsmechanik, Universit¨ at Erlangen-N¨ urnberg 35. Mueller D (1997) Powder Technol 92:179–183 36. Zeiser T, Brenner G, Durst F (2003) Application of the lattice Boltzmann CFD method on HPC systems to analyse the flow in fixed-bed reactors. In: Krause E, J¨ ager W (eds) High Performance Computing in Science and Engineering ’02, Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2002, Springer 439–450 37. Bey O, Eigenberger G (1997) Chem Eng Sci 57(8):1365–1376 38. Freund H, Zeiser T, Huber F, Klemm E, Brenner G, Durst F, Emig G (2003) Chem Eng Sci 58(3-6):903–910 39. Huber F (2002) Simulation und experimentelle Untersuchungen lokaler Transportprozesse in Festbettreaktoren. Master’s thesis, Lehrstuhl f¨ ur Technische Chemie I / Lehrstuhl f¨ ur Str¨ omungsmechanik, Universit¨ at Erlangen-N¨ urnberg

Efficient flow simulation on high performance computers

305

40. Krischke A (2001) Modellierung und experimentelle Untersuchung von Transportprozessen in durchstr¨ omten Sch¨ uttungen, VDI Fortschritt-Berichte, Reihe 3, Vol. 713. VDI-Verlag, D¨ usseldorf 41. Bernsdorf J, Brenner G, Durst F (2000) Comput Phys Commun 129(1-3):247– 255 42. Bernsdorf J, Brenner G, Zeiser T, Lammers P, Durst F (2001) Numerical analysis of the pressure drop in porous media flow using the lattice Boltzmann computational technique. In: Satofuka N (ed) Computational Fluid Dynamics 2000, Proceedings of the First International Conference on Computational Fluid Dynamics, ICCFD, Kyoto, Japan,10-14 July 2000. Springer 493–498 43. Zeiser T, Steven M, Freund H, Lammers P, Brenner G, Durst F, Bernsdorf J (2002) Phil Trans R Soc Lond A 360(1792):507–520 44. Takei T, Matsumoto H, Muramatsu K, Doi S (2001) Parallel vector performance of concurrent visualization system RVSLIB on SX-4. In: Proceedings of the 3rd Pacific Symposium on Flow Visualization and Image 45. Zeiser T, Brenner G, Lammers P, Bernsdorf J (2001) Performance aspects of lattice Boltzmann methods for application in chemical engineering. In: Jenssen C, Kvamdal T, Andersson H, Pettersen B, Ecer A, Periaux J, Satofuka N, Fox P (eds) Parallel Computational Fluid Dynamics 2000, Tends and Applications. Proceedings of the Parallel CFD 2000 Conference, Elsevier 46. Wilke J (2002) Cache optimizations for the lattice Boltzmann method in 2D. Studienarbeit, Lehrstuhl f¨ ur Informatik 10 (Systemsimulation), Universit¨ at Erlangen-N¨ urnberg 47. Igelberger, K (2003) Cache optimizations for the lattice Boltzmann method in 3D. Studienarbeit, Lehrstuhl f¨ ur Informatik 10 (Systemsimulation), Universit¨ at Erlangen-N¨ urnberg 48. Pohl T, Kowarschik M, Wilke J, Igelberger K, R¨ ude U (2003) Parall Process Lett 13(4):549–560 49. Freudiger S (2001) Effiziente Datenstrukturen f¨ ur Lattice BoltzmannSimulationen in der computergest¨ utzten Str¨ omungsmechanik. Diplomarbeit, Lehrstuhl f¨ ur Bauinformatik, TU-M¨ unchen 50. Zeiser T, Wellein G, Hager G, Donath S, Deserno F, Lammers P, Wierse M (2004) Optimized lattice Boltzmann kernels as testbeds for processor performance (submitted to SuperComputing Conference)

Simulation of problems with free surfaces by a boundary element method K.E. Afanasiev1 and S.V. Stukolov2 1 2

Kemerovo State University, Krasnaya ul. 6, 650043 Kemerovo, Russia [email protected] Kemerovo State University, Krasnaya ul. 6, 650043 Kemerovo, Russia [email protected]

Summary. The paper presents some problems on hydrodynamics of an ideal incompressible fluid with a free surface. All presented results of the problem solutions are achieved by the boundary element methods. The main trends of numerical research are described, the most interesting, from the authors point of view, numerical results of the solutions of some concrete problems are provided, a brief review of some works by other authors is presented. The new classes of problems on hydrodynamics with free boundaries and the realization of the algorithms, i.e. highly productive parallel calculations, are discussed.

1 Introduction The problems on hydrodynamics of an ideal incompressible fluid with free boundaries are always the object of researcher attention due to the large amount of practical applications described by the model of an ideal fluid. The mentioned problems are difficult enough because of their non-linear nature and initially unknown position of the free boundary. The appearance of numerical methods has breathed the second life into scientific research as a whole and into the problems under consideration in particular. One of the effective methods of solution of the problems on hydrodynamics of an ideal fluid is the boundary element methods based on the Cauchy integral formula and the third Green formula. Below we provide the description of various problems, which were recently solved, the references to some original works, and the review of other authors works on similar themes. The certain success in solution of all the problems mentioned below is caused by the proper understanding of the numerical technology based on the accurate elucidation of the main peculiarities of the boundary elements methods and the used algorithms [1]–[4].

308

K.E. Afanasiev and S.V. Stukolov

2 Steady state plane problems The solution of problems on flows with different peculiarities, flows over airfoils, smooth contours and other obstructions is the subject of many works by native and foreign authors. In [5] the problem on supercritical flow of heavy fluid in the channel with the curvilinear bottom is solved. The important conclusion is made that in the presence of an obstruction of symmetrical form on the bottom the form of the free surface is also symmetrical. However, in the case of flow over a bulge at the Froude numbers (Fr) close to one the problem has several solutions. This fact is proved in the work in [6] for the first time. The matter of making two solutions in the accurate non-linear formulation is numerically studied in [7], where it is showed that in absence of any obstruction the first solution corresponds to a rectilinear flow, and the second one describes a solitary wave. Instead of the Froude number the author introduces the parameter V = V0 /V∞ , which specifies the ratio of the velocity V0 on the top of the wave to the velocity of the approaching flow V∞ . In this case the Froude number is a function of V : F r = F r(V ). The introduction of the parameter V provides the only solution of the problem on flow over obstructions and allows to build the waves (up to the maximum) in the whole range of the Froude numbers. In paragraphs 1.1–1.3 we use the same technology of introducing the parameter V in order to obtain the whole range of the solutions of the steady problems on supercritical flows with free boundaries over objects. The existence of gravity waves with a big enough period including solitary waves is proved in [8]. In [9] using the variation principle the fact is established that for the infinite multitude of the Froude number values the problem has at least two different solutions. In [10] the approximations of a solitary wave with a small amplitude built on the basis of Boussinesque approximations are achieved. Paragraph 1.2 is devoted to the description of the solution results of the steady problem on the circulation flow of thee ponderable fluid with a finite depth and a free surface over an airfoil. In [11] the method of discrete vortices is used for the solution of the problems on a flow over an airfoil. The solution of the boundary problems on flows over airfoils often comes to the solution of the singular integral equations, which have a singularity and also a parametrical peculiarity (the maximum thickness of the airfoil). This peculiarity is expressed in the fact that the thinner is the airfoil the smaller is the distance between the neighboring points on the upper and lower sides of its contour. As a result the equations, which are written separately for the upper and the lower sides of the airfoil, becomes identical therefore causing the sufficient difficulties for numerical solution of the problem in case of a thin airfoil or near the back edge.

Simulation of problems with free surfaces by a boundary element method

309

In [12] the system of the integral equations with respect to the tangent velocity components on the upper and the lower sides of an airfoil is provided without the parametrical peculiarity mentioned above. For the solution of the system of equations the author uses the method of discrete vortices, shows its applicability for the solution of problems on the flow over the airfoils of any thickness including the minimal one. In [13] the paradox of the angle edge of an airfoil in unsteady ideal fluid flow is revealed, which is that the solution of the non-linear problem on unsteady flow over an airfoil with the angle edge has no physical sense in the case of strict accordance with the boundary conditions on this edge. The paradox is a consequence of the accepted model of unsteady flow of the fluid near the angle edge, assuming break of the flow lines. It is established that the solution of the problem on the basis of the hypothesis of smoothness of the flow lines near the rear edge by means of local substitution of the angle edge with a sharp one has its physical sense. In [14] the circulation flow over a system of airfoils with the infinite fluid is investigated using the boundary element method (BEM). The third Green formula for the function of flow is used as the integral ration. In [15] the steady problem on flow over airfoils using BEM and the third Green formula for the function of flow is solved for the case of the limited flow with a free surface. In order to determine the unknown free boundary the iteration algorithms for the small and big Froude numbers are provided. Though if the Froude numbers are close to one, then the solution by the suggested method is impossible. In [16] the method for the investigation of flow over airfoils using the integral Cauchy formula written for the tangent and the normal components of velocity is suggested. He provided the solutions either for a solitary airfoil or for a system of airfoils in the infinite flow. In [17] the results of the numerical simulation of flow over airfoils with a viscous fluid are presented. Depending on the parameters of the flow the values of the resistance power, the carrying power and the moment are calculated. In [18] the boundary element method on the basis of the Green formula for the solution of potential problems on flows over airfoils is suggested. The test is done by solving the problem on the flow over the Zhukovsky airfoil. In [19] the boundary element method is applied for the solution of the problem on an airfoil movement beneath the free boundary of a fluid. The shapes of the free boundary are investigated depending on the Froude number and the extent of submergence of the airfoil. In [20] the boundary element method is applied for the solution of the problem on an airfoil movement near the bottom. The aerodynamical characteristics of airfoils are provided depending on the distance between the airfoil and the bottom. The steady non-linear problem on the flow of the heavy fluid with the finite depth over a vortex under the free surface is solved in its full nonlinear formulation in [21]. The numerical analytical method is offered for the

310

K.E. Afanasiev and S.V. Stukolov

calculation of the pre-critical mode of the flow based on the strict theoretical results achieved in the proof of the theorem of existence of solution of the problem of a vortex [22]. In [23] the non-linear problem on flow over a system of two vortices of mutually contrary intensity by the ponderable fluid with a free surface is considered. The author provides results of the numerical experiment on investigation of influence of the vortex intensity and the Froude number on the form of the free surface and the hydrodynamic reactions of the peculiarities. In [24] the results of numerical investigation of the problem on the supercritical flow over a vortex under a free surface of the ponderable fluid are presented. For the solution of the problem the authors offer the improved LeviChivita method taking into consideration the peculiarities of the unknown function. All the works mentioned above either investigate problems with a free boundary ignoring the bottom influence or consider the influence of the bottom and the free boundary achieving the solution only for the pre-critical range of the Froude numbers. The works devoted to super-critical limited flow over peculiarities pay insufficient attention or pay no attention to the plurality of the problem solution subject to the Froude number. Below we provide the results of numerical investigation of the problems on steady flows over obstructions demonstrating the plurality of the problem solution subject to the Froude number. 2.1 Flow over obstructions The works [25]–[28] contain the solution of the problem on the flow over a semicircular cylindrical bulge up to the waves with the maximum amplitude. It is ascertained that in the area of the waves with the maximum amplitude the problem has several solutions subject to the Froude number. Fig. 1(a) shows the results of the calculation of dependence of the amplitude A on the Froude number F r for the different values of the radius of the streamlined obstruction R. The curve 1 corresponds to the radius R = 0 (the absence of any obstruction on the bottom) and describes a solitary wave. The curve 2 corresponds to the solution of the problem with the radius values R = 0, 1; curve 3 – R = 0, 2; curve 4 – R = 0, 3; curve 5 – R = 0, 5; curve 6 – R = 0, 7; curve 7 – R = 0, 9; curve 8 – R = 1; curve 9 – R = 1, 1. The line connecting the apices of the curves is a graph of dependence of the amplitude A on the Froude number F r: A = F r2 /2. This dependence describes the wave which velocity at its top comes to zero. Fig. 1(a) shows two solutions of the problem in the case of the Froude numbers close to one. Fig. 1(b) shows a zoomed up fragment of Fig. 1(a) proving the fact that there are some zones of the Froude numbers within which the problem has three solutions. Fig. 1(c) shows the area of waves with the maximum amplitude on even a larger scale.

Simulation of problems with free surfaces by a boundary element method

(a)

311

(b)

(c)

Fig. 1. The dependence of the amplitude A on the Froude number F r for the different values of the radius of the streamlined obstruction R: (a) two solutions of the problem for the Froude number values close to 1; (b) the zoomed up fragment of Fig. 1(a); (c) the zoomed area of the waves with the maximum amplitude

One more curve of the solution can be seen in this figure due to zooming up. The dependence of the amplitude A on the Froude number F r has a rapidly decreasing oscillatory nature in the zone of maximum waves. As for the inverse dependence F r(A) we have found three first extremums proving the fact that there are some zones of the Froude numbers within which the problem has at least one, two, three and four solutions. With no obstructions on the bottom we have got the non-linear steady solitary waves, which are used later for setting the initial conditions in unsteady problems on waves. Besides, in [29] the approximations of a solitary wave are provided, which give the approximation of the wave geometry and the distribution of the potential on it very well. It is ascertained that the integral characteristics of a nonlinear solitary wave, such as mass, kinetic and potential energies, reach their maximum and local minimum before the moment of the highest wave. The work [30] offers the method of constructing the free boundary, which allows to avoid the calculation of the velocity field. Therefore, the error of calculations is decreased.

312

K.E. Afanasiev and S.V. Stukolov

2.2 Circulation around airfoils In [31] the steady problem on circulation flow around the airfoils by a heavy fluid limited by a free boundary and a rectilinear bottom is considered. A great number of works is devoted to the problems on flows around airfoils but, in contrast to the mentioned one, the area of flow is, as a rule, infinite or limited by the solid walls. The free boundary introduces additional complexity into numerical modeling of the given problem because its position is initially unknown and should be determined during solution of the problem equally with determination of the velocity circulation along the examined contour. In the process of the problem solution we have ascertained the plurality of dependence of the wave amplitude on the free boundary and on the Froude number. Fig. 2 shows the calculation results of dependence of the wave amplitude A on the Froude number F r in case of the flow around the Zhukovsky airfoil which sharp edge is in the point (0; 0, 5) (curve 1 corresponds to the attack angle β = 00 , curve 2 – β = 150 ). The stroke line (curve 3) corresponds to the graph of dependence A = F r2 /2 and is the limit of existing steady solutions on the amplitude. The curves A = A(F r) in Fig. 2 do not reach the stroke line. It happens evidently because for the circulation flow around airfoils there should be the other estimation of the top boundary for the amplitudes within which the problem has a steady solution. The qualitative behavior of the curves in Fig. 2 is analogous to the curves obtained during the solution of the steady problem on flow over a semicircular bulge on the bottom. In order to solve the problem (due to the several values of the potential) we have to modernize the CBEM. We rewrite it for the velocity field (the velocity field is continuous), therefore we apply an original technology of setting the Zhukovsky condition on the back sharp edge of the airfoil [30]. Fig. 3 shows the streaming lines of the area of flow near the Zhukovsky airfoil (in Fig.3(a) the attack angle β = 00 , F r = 1, 064, A = 0, 087; in Fig.3(b) the attack angle β = 150 , F r = 1, 237, A = 0, 355). In order to build the stream lines we calculate the velocity field inside the area of flow. Good correspondence of the flow pictures with the Zhukovsky condition can be an indirect corroboration of the calculation properness.

Fig. 2. The dependence of the amplitude A on the Froude number F r in case of the flow around the Zhukovsky airfoil (curve 1 – β = 00 , curve 2 – β = 150 )

Simulation of problems with free surfaces by a boundary element method

(a)

313

(b)

Fig. 3. The streaming lines of the area of flow near the Zhukovsky airfoil: (a) β = 00 , F r = 1, 064, A = 0, 087; (b) β = 150 , F r = 1, 237, A = 0, 355

2.3 Steady vortical flow over obstructions In [32] the steady problem on vortical plane-parallel ideal incompressible flow over an obstruction located on the bottom by a fluid with a finite depth and a free surface is considered. This problem is described with the Poisson equation for the function of flow, the right part of which contains the vorticity of the flow ω. In this case the vorticity is assumed to be equal to the constant from the range 0 < ω < 1. During the solution of the given problem we have also found the multi-valued dependence of the wave amplitude on the Froude number in the case of flow over a semicircular cylindrical bulge with the radius R = 0, 5 (Fig. 4). In absence of any obstruction on the bottom the offered algorithm allows to build the solitary waves up to the maximum.

3 Unsteady plane problems 3.1 Horizontal movement of a semi-infinite body in a fluid In [33] the solution of the problem on the movement of a semi-infinite body on the surface of a fluid is presented. During the solution of such kind of a problem there is the complexity bound up with the infinite velocities in the points of contiguity of the body with the fluid. The work offers the following methods. In the initial time moment the non-penetration condition is set on the free

Fig. 4. The multi-valued dependence of the wave amplitude on the Froude number in the case of flow over a semicircular cylindrical bulge with the radius R = 0, 5

314

K.E. Afanasiev and S.V. Stukolov

boundary and the potential distribution along it is determined. After that in the following time moments the non-penetration condition should be removed (changed with the kinematical and dynamical conditions) and solved up to the last moment of the wave capsizing arising in front of the body (Fig. 5). Earlier such methods of setting the initial distribution of the potential on a free surface were used in [34], [35]. 3.2 Horizontal movement of a semi-circular cylinder along a plane bottom In [36] the kinematical characteristics of the arising stream are investigated. The main attention is paid to the investigation of characteristics of the wave arising behind the body and to the movement of the fluid in such a range of the Froude numbers within which no steady streams can exist. The first group of problems is conventionally named as “essentially non-linear problems”, the second group – as “essentially non-preset problems”. In the first group the capsizing waves arise behind the body. Because of the rapidness of the process, the considerable accelerations and large distortions of the free boundary, they are the most complicated objects for the investigation of wave phenomena. In [37] while considering the solitary waves it is said that a capsizing wave can be divided into three zones arising already before the wave becomes vertical (Fig. 6): 1) the velocity of separate particles of the fluid exceeds the wave velocity; 2) the front zone of the wave has a thin layer within which the acceleration of the particles is much higher than accelerations within the

(a)

(b)

(c) Fig. 5. The last moment of the wave capsizing arising in front of the body

Simulation of problems with free surfaces by a boundary element method

315

rest of the wave; 3) on the back surface of the wave a vast enough zone arises within which the acceleration of the particles is very small. Though waves under consideration in the present work differ from the solitary ones but they also have the above listed peculiarities of capsizing solitary waves what is well seen on Fig. 6. Fig. 6 shows the calculated fields of velocities and accelerations. Apparently the zones described in the review mentioned above are also present in these problems. The zones are marked with the corresponding digits. The more “calm” modes of the stream under consideration in the second group are conventionally divided into three modes: I - super-critical streams in the case of a steady mode: 1, 2 < F r < 100; II - pre-critical streams in the case when the steady solutions also exist: 0, 1 < F r < 0, 6; III - near-critical streams in the case when no steady solutions are possible: 0, 6 < F r < 1, 2. The streams corresponding to the mode III should not pass to the steady mode. However, during the calculations no complexities bound up with this circumstance have arisen. The investigation of the kinematical picture of the stream has also revealed nothing. In this case the only fact is apparent that here the necessary condition for steady solution is not valid, i.e. the resistance strength in these modes doesn’t tend to naught. 3.3 Interaction of solitons with obstructions The unsteady problems of the interaction of solitary waves with different obstructions are the subject of a great number of theoretical and experimental works because of the importance of matters for the determination of the influence of waves upon the hydrotechnical constructions and the defined areas of water in ports. The main attention in the problems of moving of solitary waves onto the cost constructions is paid, as a rule, to the determination of the maximum value of swamping and the dynamical influence. The voluminous scientific literature is available on this subject presenting the results of

Fig. 6. The capsizing wave: the field of velocity and the field of acceleration

316

K.E. Afanasiev and S.V. Stukolov

both natural [38]–[40] and theoretical numerical experiments on the basis of analytical, semi-analytical and numerical methods [28], [37], [41]–[47]. Some modes of the streams are accompanied with the wave capsizing and cause the most complexity for numerical modeling due to the essentially nonlinear nature of a free boundary in the last time moments before the collapse. The attempts to analyze the character of the wave collapse in case of their interaction with different obstructions have been made by many authors (i.e. [37]). In [48] the authors determined the amplitude maximum of an non-breaking solitary wave with the help of a series of laboratory tests. The experiments were carried out for a rectilinear and inclined bottom. Considering the results of the experiments the authors recommend that the water depth should be more than 10 cm to avoid effects bound up with the size. In [49] the flow with the free surface arising as a result of motion of a pool bottom or a wall is considered. For the case of rising of a part of the bottom the flow pictures are provided for different Froude numbers. In the case of motion of a pool wall a boron is formed, which is later transformed into a series of dispersing waves. The work provides the pictures of the flows arising as a result of motion of a vertical and inclined wall of the pool in the case of modifying the velocity of the wall motion. Some modes lead to collapse of the formed waves. In the case, which is not bound up with the collapse of the waves the first moving away wave later gets the form of a soliton. In [50] the results of the laboratory experiments on research of the process of a solitary wave capsizing as a result of its interaction with a rectilinear bottom ledge are presented. Depending on the wave amplitude and the obstruction shape the classification of the wave modes on to the type of capsizing is provided. In [51] the results of the laboratory experiments on research of the process of a wave suppression as a result of its interaction with a bottom obstruction of a rectangular form are presented. The researches are carried out on arising the collapse of a wave depending on its amplitude and the obstruction form. The authors of [52] provide the model (non-vortical Green-Naghdi model) to study non-linear wave processes. Applicability of the model is demonstrated on the basis of the solution of a number of wave problems (destruction of a soliton, moving onto a vertical wall and an inclined shore) and the comparison of the solution results with experiments and results of other authors. In [53] the results of numerical simulation are provided, which is executed on the basis of the particle method, of the interaction of solitary waves with the obstructions in the form of a vertical and inclined wall, as well as with an underwater bottom ledge of a rectangular form. The effects of wave collapse are also investigated. In [54] the parallel method of finite elements is presented. It is used as the basis for solution of 2- and 3-dimensional problems on flow with free boundaries over a circular cylinder. Some of the modes are bound up with the collapse of the formed waves.

Simulation of problems with free surfaces by a boundary element method

317

Fig. 7. The zones corresponding to the different wave modes: W − T – wave train; F − B – forward breakers; B − B – backward breakers; C − C – crest-crest exchange; T – Tanaka instability

In [55] the trends of future research on wave themes are discussed. The trends include: development of numerical simulation for investigation of nonlinear wave processes, use of numerical methods based on discrete particles allowing simulation of wave capsizing with arising sprays, air cavities and vortex zones, necessity of creation of a mechanism in the most accurate way describing processes of wave collapse. Below we present the results of numerical calculations of the unsteady problems on the interaction of solitary waves with a semi-circular cylindrical bulge on the bottom, with an inclined solid wall, with a submerged rectangular ledge. Each case is provided with its classification of the arising streams on the type of capsizing subject to the determining parameters of the problems. We also consider the problem of the interaction of solitary waves with a solid partially submerged into a fluid. In [56], [57] the problems of the interaction of solitary waves with a semicircular cylindrical bulge and an inclined solid wall are considered. The problem of movement of a wave onto a semi-circular cylindrical bulge is described in detail in [58] where it is shown that the interaction of the wave with the submerged cylinder generates different wave pictures. The authors classified the wave movements and determined 5 zones corresponding to the different wave modes: W −T – wave train; F −B – forward breakers; B −B – backward breakers; C − C – crest-crest exchange; T – Tanaka instability. The diagram (Fig. 7) shows the revealed zones depending on the radius R of the bulge and the amplitude A of the wave. The authors also carried out calculations on this problem independently of the above-mentioned paper. Having studied the diagram provided in the paper we compared all results in detail. It occurs that the calculations completely correspond to the proposed wave classification and well blend with the diagram (Fig. 7) taken from the work mentioned above (the dots signify the fulfilled numerical calculations).

318

K.E. Afanasiev and S.V. Stukolov

(a)

(b)

(c)

(d)

Fig. 8. Zones of streams on the type of capsizing depending on the angle of the wall incline: (a) F − B; (b) F − B − F ; (c) B − B; (d) W − T

The second problem also determines four zones of streams on the type of capsizing depending on the angle of the wall incline. The mode names (W − T – Fig. 8(d); F − B – Fig. 8(a); B − B – Fig. 8(c)) are analogous to those in the first problem, however there is a new mode F − B − F (Fig. 8(b)) – the forward breakers during outflow of the wave. The variable parameters in the given problem are the angle of the wall incline which changes its values from 50 to 900 with the step 50 , and the soliton amplitude 0, 2 < A < 0, 6. The diagram of arising streams depending on the wave amplitude and the angle of the wall incline is shown in Fig. 9. In [31], [59] the results of solution of the problem on the interaction of solitary waves with a submerged rectangular ledge are contained. The problem in its complete non-linear formulation is solved by the complex boundary element method. The processes of capsizing of waves are analyzed on the basis of numerous calculations. As a result of calculations of the problem on the interaction of solitary waves with the rectangular submerged ledge with the height h three types of wave streams have been revealed: the mode in which the wave doesn’t capsize and in its further movement obtains the form of a steady wave with the larger amplitude than the initial wave (Fig. 10); it capsizes from its crest to the fore-front (the floating surf – Fig. 11); it capsizes from the fore-front to its foot (the diving surf - Fig. 12). Besides, we have investigated the process of a dispersing tail of waves with small amplitude behind the main wave. The origin of each going away wave is located on the forefront of the wave where a wave clot arises which rolls over the wave crest and then goes away from it. The problem of interaction of waves with a solid partially submerged into the ideal incompressible fluid belongs to the class of non-linear problems with nonstationary free boundaries, at that the form of the free boundaries is initially unknown and should be determined in the process of solving the prob-

Simulation of problems with free surfaces by a boundary element method

319

Fig. 9. The arising streams depending on the wave amplitude and the angle of the wall incline

Fig. 10. The mode of movement of the wave with small amplitude (A = 0, 1; h = 0, 7; t ∈ [5; 14, 5])

Fig. 11. The mode of a floating surf

Fig. 12. The mode of a diving surf (A = 0, 5; h = 0, 3; t ∈ [15; 17, 4]) (A = 0, 7; h = 0, 7; t ∈ [7; 10, 8])

lem. One of the most important problems is determination of the effect of surface waves upon waterworks [39], [60]. In [55], [61] the authors apply the method of finite differences and the “markers-&-cells” method for investigation of characteristics of three-dimensional non-linear movements of waves during their interaction with a fixed solid. The achieved numerical results are compared with the experimental results. Further the work3 considers numerical solution of the plane problem on wave movement of the ideal incompressible fluid in a pool with a constant depth H = 1 limited by vertical impermeable walls, x = −15 – from the left and x = L – from the right. It also studies the process of the interaction of a solitary wave with a solid of rectangular section partially submerged into a fluid. The varied parameters of the problem are the values: A – the amplitude of the wave, h – the distance from the bottom to the obstruction, b – the distance from the rear boundary of the solid to the right boundary of the pool, a = xr − xl – width of the solid, where xl = x1 and xr = xl + a – abscissas of the left and right side vertical walls of the solid. Figures 13-14 present the pictures of arising flows during the interaction of the wave of the amplitude A = 0.3 with the solid. The geometrical parameters of the obstruction are the following: h = 0.5, a = 1. In the initial time moment t = 0 the wave top is located in the point x = −5. During the wave reflection off the face of the solid a past wave of certain amplitude is formed on the right of the solid (Fig. 13). If the solid is near the 3

Calculations on this problem are carried out by Berezin E.N.

320

K.E. Afanasiev and S.V. Stukolov

(a)

(b)

(c)

Fig. 13. Profiles of the free surface for the wave amplitude A = 0.3: b = 18, xl = 5, L = 21

(a)

(b)

(c)

Fig. 14. Profiles of the free surface for the wave amplitude A = 0.3: b = 0.5, xl = 13.5, L = 15

vertical wall (Fig. 14), then the picture of the flow becomes more complicated.

Simulation of problems with free surfaces by a boundary element method

321

(a)

(b)

(c)

(d) Fig. 15. (a) maximal uprush onto the face of the solid Yl ; (b) maximal uprush onto the rear of the solid Yr ; (c) value of the amplitude of the reflected wave Aref l ; (d) value of the amplitude of the past wave Apast

Almost even oscillation of the fluid column with a large amplitude occurs in the gap between the solid and the right wall of the pool. The series of calculations for different values of the amplitude A of the incoming wave, the width of the solid a and the gap h revealed that in the case of enlarging the width of the solid and its extent of submergence the values of uprushes on the face of the solid Yl (Fig. 15a) and the amplitude of the reflected wave Aref l (Fig. 15c) increase, and uprushes on the rear of the solid Yr (Fig. 15b) and the amplitude of the past wave Apast (Fig. 15d) decrease. Lines 1, 2, 3, 4 correspond to the width of the solid a: 1 − a = 1,

322

K.E. Afanasiev and S.V. Stukolov

(a)

(b)

(c)

(d)

(e)

(f) Fig. 16. Time charts of the dynamical loading Ps on the hard boundaries of the solid: a,d - b=8, b,e =4, c,f =0.5

2 − a = 2, 3 − a = 4, 4 − a = 8. At the same time the distance from the bottom of the pool to the lower boundary of the solid h is equal to 0.5. Fig. 16 shows time charts of the value of the loading Ps for the wave of the amplitude A = 0.1 depending on the time t and the value of the distance from the bottom of the pool and the lower boundary of the solid h. Curves 1 − −4 signify changing of the parameter h: 1 − h = 0.1, 2 − h = 0.3, 3 − h = 0.5, 4 − h = 0.7. Figures 16a–16c show the changing of the loading on the face

Simulation of problems with free surfaces by a boundary element method

323

Fig. 17. Collapse of a semi-circular cavity (the plane case)

of the solid at x = xl . Figures 16d–16f present the graphs of changing the loading on the rear boundary of the solid at x = xr . In the case of large enough distance from the rear boundary of the solid and the right boundary of the pool b one can observe presence of two local extremums of the value of the loading. It is explained by the fact that in the moment of rolling of the solitary wave onto the face of the solid to the right from it the formed wave is reflected off the right wall of the pool. In the case of decreasing b one can observe presence of one maximal extremum of the dynamical loading since the past wave practically is not formed behind the solid. 3.4 Cavity collapse on the free boundary, fluid oscillation. In [25], [62], [63] the solution of the unsteady problem on the collapse of a cavity of a given form in the pool with a finite depth is considered. For the first time this problem was formulated by M.A. Lavrentiev and investigated by V.K. Kedrinski with the EHDA method. The given class of problems is interesting by the fact that during the collapse process a cumulative jet of fluid with high velocity and complex geometry is formed owing to what the given phenomenon becomes difficult for mathematical modeling. In [64] the use of the Eiler-Lagrange approach for simulation of fluid flows with the free surface is considered. For the solution the finite element method is applied. The demonstration of the suggested method of solution is fulfilled for the problem of sinking of a spherical drop into a pond. At the initial stage the drop is partially submerged into the fluid (in order to avoid not simply connected area of solution), and it is an object of a certain radius with smoothed zones of contact with the fluid. The work provides the pictures of the arising flows. In [65] provides results of full-scale experiments on interaction of a sinking drop with the free surface. Interaction of the drop with the fluid is bound up with formation of a semi-circular cavity on the free boundary and its further collapse.

324

K.E. Afanasiev and S.V. Stukolov

Earlier the analogous problem was solved in [66]. The work compares experimental data and numerical calculations fulfilled by the boundary element method. The problem under our consideration is solved in plane and axissymmetric formulations. Fig. 17 shows the comparison of the achieved forms of the free surface (left figure) with the results presented in [67] (right figure). The Fig. 17 shows good qualitative and quantitative coincidence of the results of the calculations. For more detailed description of this problem and results of its solution see [62].

4 Unsteady axis-symmetric problems of cavitation bubble The best success in solution of unsteady problems on dynamics of a gas bubble in the ideal fluid has been achieved using the methods of boundary integral equations. First experiments on applying BIE methods for the solution of the problems on dynamics of a solitary bubble were realized in [68]–[72]. The description of experimental results on dynamics of a solitary bubble near a free boundary can be found in [73]–[75]. Numerical solutions of this problem are the subject of [76]–[79]. The first numerical results on collapse of a bubble in the infinite fluid near a solid wall were obtained in [70], [80]–[83]. The behavior of a semi-spherical bubble joined a solid wall with its base was studied in [84]. The research showed that during the collapse the bubble lost its initial spherical form and in the time moments just before collapse a cumulative jet arose being directed to the solid wall. This was also proved by experimental results presented in [85]–[91]. A good number of experimental (see, for instance, [86], [92]) and numerical results [66], [86], [93], [94] on dynamics of a solitary gas bubble have appeared during the last time. The papers [95], [96] were devoted to the investigation of the influence of a solid wall and a free surface on evolution of the initially expanding bubble.

Fig. 18. Evolution of two bubbles near a solid wall

Fig. 19. Evolution of two bubbles near a free surface

Simulation of problems with free surfaces by a boundary element method

325

The migration of the bubble toward one of the boundaries is determined by means of the impulse of movement quantity (the Kelvin impulse). Theoretical investigations of the Kelvin impulse can be found in [68]. The problems on dynamics of the interaction of a chain of bubbles were studied to the much less extent. The partial solution of this problem is given in [97] investigating the behavior of two bubbles, which in the initial moment are in the phase of maximum expansion, different laws of changing the inner pressure being applied. The results were presented at the International Conference in Grenoble (France, 1988). The calculations contained in the quoted works showed that the behavior of bubbles in the chain principally differed from behavior of a solitary bubble. The interaction of the bubbles in the chain can be considered as the simplest form of interaction of the bubbles in a more complex structure, i.e. a bubble cloud [98]. This class of problems is very difficult for research what is indirectly corroborated by a small amount of publications on this matter. In [86] experimental and numerical investigation of behavior of two bubbles located above a solid wall is presented. It shows that closeness of the upper bubble to the lower one located near the solid wall puts up strong counteraction so that the lower bubble not always collapses toward the wall as it occurs in case of a solitary bubble [83], [99]. In [92] the problems of interaction of two bubbles (located on the axis of symmetry) formed in different moments of time are experimentally studied. It shows strong dynamical influence of the bubbles each on another. On the basis of this experiment our work presents numerical calculation producing good efficient correspondence of the experimental and numerical results. In [25], [100], [101] the problems on the dynamics of a single bubble located between a free surface and a solid wall are considered. It discusses combined influence of solid walls and free boundaries on the bubble evolution. It also provides mathematical formulation of the problem in case of simultaneous influence of the solid wall and the free surface on the process of the bubble evolution. In [25] the problems on the dynamics of two bubbles located one above another are considered. The problem is solved in two variants: 1) the bubbles are at a large enough distance from the free surface of the fluid and 2) the bubbles are near the solid wall (Fig. 18) or the free surface (Fig. 19). It is shown that nearness of the upper bubble to the lower one located close to the solid wall causes such strong counteraction that the lower bubble not always collapses toward the wall as it occurs in case of a single bubble. In [102], [103] the axis-symmetric problem on dynamics of three initially spherical bubbles, centers of which are located each above other on the axis of symmetry, is considered. This problem is characterized with a very large number of variants, that is why only eight calculations with non-repeated combinations are provided. For all the calculations, equally with the form of the bubbles, the following graphs are provided: the volume of the bubbles, the position of the center

326

K.E. Afanasiev and S.V. Stukolov

of the bubbles, the normal derivative points of the bubbles on the axis of symmetry, and the pressure inside the bubbles.

5 Unsteady spatial problems of a cavitational bubble This section considers the evolution of a gas-vapor bubble in the infinite volume of the ideal incompressible ponderable fluid limited with a solid wall in the full non-linear spatial formulation. The model described in this work is applied for simulation of different, unlike on the face of it, phenomena. On the one hand we consider the dynamics of underwater shocks, on the other hand we forecast the losses because of cavitation damage when studying the whole phenomenon of bubble cavitation is impossible without the investigation of dynamics of a single, separately taken, cavity [104]. Several works have recently appeared devoted to the investigation of this model in the aspect of biophysics, namely concerning destruction of nephroliths inside lithotripters, as well as investigation of performance reliability of an artificial mitral valve [105]. This problem is well studied in the axis-symmetric formulation, however there are only few works devoted to investigation of dynamics of a bubble in the full spatial formulation [69], [106], [107]–[109]. The problem comes to succession of linear boundary problems at each step on time [110]. The methods of movement on time are described in [1]. At each step on time we receive the Dirichle problem for the Laplace equation solved by the boundary element method. The bubble surface is approximated with linear triangular elements. In the initial moment of time the bubble surface is a sphere of small radius located on the distance d from the solid wall. As the initial one we take the value of the potential from the Rayleigh problem. In the process of evolution the bubble is expanded up to its maximal radius and then passes into the phase of locking. The research concerns the influence of the closely located solid wall, gravity, gas concentration in the bubble and surface tension on the process of evolution of the bubble. In the process of expansion the bubble, as a rule, keeps its spherical or close to spherical shape. In the process of locking the influence of gravity leads to breakdown of spherical symmetry of the bubble – a cumulative jet of fluid is formed and takes root inside the bubble being directed opposite gravity. In case the bubble is located near a solid wall (no farther than 3-4 of its maximal radiuses) in the absence of gravity, one can observe a well-known effect of attraction of the bubble by the solid wall [95], and the cumulative jet formed at the stage of locking the bubble is developed from the bubble side, farthest from the wall, and toward the wall. The gas presence inside the bubble in the imponderable infinite fluid results in the fact that the evolution of the bubble assumes a pulsatory character, the process of expansion of the bubble is substituted for the process of locking and vice versa, at that the bubble keeps its spherical shape for all the time of its existence. In theory the process of bubble pulse could last infinitely

Simulation of problems with free surfaces by a boundary element method

327

since this model doesn’t envisage any losses of energy because of acoustic emanation. During numerical simulation it is possible to get only few bubble pulses (not more than five). The break of numerical calculation occurs because of breakdown of spherical symmetry of the free boundary - the bubble surface, resulting in development of numerical instability and “abnormal termination”. In the case when the spherical symmetry of the bubble is broken by presence of a solid wall and (or) by the effect of gravity, the gas presence inside the bubble doesn’t lead to repeated bubble pulses since in the process of its first locking the bubble has lost its spherical shape. In this case the gas inside the bubble only counteract to locking thus changing the temporal characteristics of the process. The influence of surface tension is visibly apparent only in the absence of gas inside the bubble. The surface tension puts up resistance to expansion of the bubble and prevents deformation of the bubble on the stage of its locking. In the case when the evolution of the bubble is influenced by several of the described factors at once, different variants of the bubble evolution are possible. In the process of locking the bubble may form sharp sides or peaks resulting in break of numerical calculation, and it seems not possible to forecast further behavior of the bubble in this situation (Fig. 20a). Besides, the tendency may be observed when the initial bubble is divided into two smaller (Fig. 20b). But in the most of cases a cumulative jet is developed taking root inside the bubble and developing up to the moment of its touching the opposite wall of the bubble (Figures 20c–20f). he direction and the velocity of the jet are determined by the value of the floatage parameter regulating the effect of gravity and by location of the solid wall [111] – [115]. We consider the possibility of forecast of the direction of the bubble migration and the direction of the jet development by using the Kelvin impulse described in [68], [108]. The Kelvin impulse is an estimation taken from the vector of movement quantity using a number of assumptions. As the research shows, the estimations higher than the zero curve of the Kelvin impulse are, as a rule, true. All false or moot estimations are located below this curve. The investigation of the cumulative effect and the possibility of damaging a target (a solid wall) is of the most interest. The most important parameters for evaluation of the erosive effect are height and direction of the jet, velocity of its top and presence of a layer of the fluid between the bubble and the solid wall [104]. When proceeding to dimensional quantities one should consider the different types of bubbles: single cavitation bubbles described in [116] of the radiuses 1 and 10 mm, a bubble formed during explosion of a mine of the radius 1 m [117], and a bubble formed during nuclear explosion with the power 30 Kt “Wigwam” reaching the radius 120 m [118]. The dimensional characteristics received for the different types of bubbles are sufficiently different. Thus the maximal velocity of the jet of a cavitation bubble doesn’t exceed 200 mps. The work [104] provides the minimal value of the jet velocity necessary to break through a target of the softest material (aluminum), which is equal to 1.83 ∗ 105 mps. Evidently the jet formed during

328

K.E. Afanasiev and S.V. Stukolov

locking of a single cavitation bubble is not able to make such velocity and cause any damage to the solid wall. During the mine explosion the maximal velocity of the jet is equal 5.176 mps, which is more than thrice the sound speed in the water, this completely corresponds to the evaluations provided in [117]. During the nuclear explosion the jet velocity is already more than several dozens the sound speed in the water and is equal 22996 mps.

6 “AKORD” package for solution of problems with free boundaries Fast development of computing equipment and its introduction into practically all spheres of life has resulted in extensive spreading packages of application programs (PAP) intended for the solution of scientific, engineering and applied problems. One of the trends of development of the packages is creation of so-called systems of automated analysis (CAE — Computer Aided Engineering) directed to solution of various problems on Mathematical Physics. The systems of automated analysis, equally with systems of automated design (CAD — Computer Aided Design) and systems of automated manufacturing (CAM — Computer Aided Manufacturing), form the united program information medium responsible for the whole cycle of solution of a certain problem from its description up to returning prepared project solutions. Among such systems realizing the boundary element method we should mark out the packages ANSYS, ALGOR, NASTRAN, COSMOS, I-DEAS, etc. One of such most widespread complexes is currently the ANSYS package (http://www.ansys.com) intended for the investigation of problems on static and dynamical analysis of constructions taking into consideration geometrical and physical non-linearity; problems on creeping and plasticity; problems on linear and non-linear stability of constructions; steady and unsteady problems on thermophysics taking into consideration the phase transition; problems on Hydro-gas-dynamics; of electromagnetic fields (including high-frequency analysis); problems on Acoustics; linked problems (such as ones of interaction of a fluid with a construction). One more example of a popular package realizing the boundary element method is the BEASY package (http://www.beasy.com). This package consists of four principal modules: BEASY Mechanical Design - intended for solution of problems on mechanics; BEASY Fatigue and Crack Growth - for analysis of material fatigue and crack growth; BEASY Acoustic Design - for solution of problems on acoustics; BEASY Corrosion and Cathodic Protection - for problems of corrosion and anti-corrosion protection. The Institute of Computing Technologies of Siberian Branch of RAS has developed the packages: “Wave in the Ocean”, “TSUNAMI ”, “EVENT ”, “START ” [119] intended for the investigation of spreading of tsunami waves. They are able to simulate different scenarios of the phenomenon development

Simulation of problems with free surfaces by a boundary element method

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 20. Shapes of the bubbles during their evolution near a solid wall

329

330

K.E. Afanasiev and S.V. Stukolov

using a big massive of natural and experimental information and, in the operative mode, they are able to process data coming via telecommunication channels. These integrated systems are intended for information support of the procedure of taking decisions in extreme conditions of an impending catastrophe or in conditions of planning development of the seaside. The “AKORD” package described in [1], [120] – [122] is developed by us for the solution of steady and unsteady problems on hydrodynamics of the ideal incompressible fluid with free boundaries (for plane and axis-symmetric cases). The “AKORD” package consists of several main applications realizing all principal stages of numerical solution of problems from description of the initial data of each problem up to graphical analysis of all achieved results. The pre-processor is the program component used for preparation of input data for the numerical experiment. The solver allows to solve problems numerically on the basis of the data prepared by the pre-processor. The post-processor is the program module for graphical presentation of the results of the solver functioning. Each component of the “AKORD” package can function as a separate program unit which independently receives all the necessary data, process them and returns the results in accordance with the common exchange interface created for compatibility of the applications contained in the package. The further development of the package presupposes natural addition of a complex of applications for solution of spatial problems. As one of the first steps to this the “AKORD” package contains a pre-processor prototype [124] preparing the initial data for solution of spatial problems using the boundary element method. Of course, this prototype is still far from being a complete application, however it already has got the principal abilities to generate data for the three-dimensional solver. The pre-processor consists of several functional blocks combined in the common application by means of the program interface. The block of generation of the surface grid allows creating a set of elements approximating the surfaces of spatial objects under consideration. The block of geometrical transformations is intended for visualization of the grid image and for various manipulations with it such as scaling, moving and rotating. This is necessary for visual work with spatial objects. The block of setting parameters of the problem provides opportunities for complete determination of both conditions of the problem and parameters for the concrete version of the solver program. Using special parameters one can control the process of calculations itself and determine the needed format of presenting results.

7 Conclusions In the latest years the great attention is paid to highly productive parallel calculations realized on the clusters of work stations. Principal advantages of

Simulation of problems with free surfaces by a boundary element method

Fig. 21. Time consumptions for realization of the whole calculation

331

Fig. 22. Acceleration of the parallel algorithm

the clusters based on a PC net are abilities to gradual expansion and update [125]. The transition of the existing packages of consecutive programs onto the parallel basis causes a number of problems [126]. Firstly, it requires fast-acting communication equipment providing a medium for data transfer. Secondly, while creating programs one should take into consideration the topology of the network, parallel properties of realized algorithms and, beside all these, should distribute data between the processors for minimization of the number of requests and the volume of transferred data. That is why writing parallel algorithms is much more complicated than the consecutive ones. By the present moment we have investigated parallel properties of algorithms realizing the complex boundary element method for solution of simply connected steady and unsteady problems with free boundaries. We have fulfilled parallelization of the program code and proved acceleration and effectiveness of parallel programs in case of their realization on cluster systems [127]. The cluster is a set of PCs with nodes Celeron/400Mhz, 64 Mb RAM, netware - 100 Mbps Fast Ethernet, OS - RedHat Linux 6.2, communication library - MPICH 1.2.1. To analyze the effectiveness of parallelization of the algorithm of solution of an unsteady problem we solved the test problem of movement of a solitary wave with the amplitude A = 0, 5 in a pool of the constant depth H = 1. It is important in this test that in the process of movement solitary waves do not change their amplitude and velocity, keep their form and complete energy (it was used to check the efficiency of the algorithm of solution of the problem). For the calculation we took the area D = {−20 ≤ x ≤ 20; −1 ≤ y ≤ y0 } where y0 described a solitary steady wave. The wave crest at t = 0 was in the point x = −5, y = 0, 5. The calculation was executed up to the moment of dimensionless time t = 8, 28 when the wave crest passed to the point with the abscissa x = 5. By that moment of time the wave had gone the way equal 1,3 of the wave length. For the analysis of parallel properties of the algorithm we executed a series of calculations for different number of nodes on the boundary of the calculated area (from 400 to 2000 with the step 200) and different number of processors activated for the calculation (1, 2, 4, 8).

332

K.E. Afanasiev and S.V. Stukolov

Fig. 23. Effectiveness of the parallel algorithm

Fig. 21 presents graphs of time consumptions for realization of the parallel algorithm for solution of the unsteady problem on the movement of a solitary wave up to the moment of dimensionless time 8, 28 on 1, 2, 4 and 8 processors, correspondingly. Figures 22–23 present acceleration and effectiveness of the parallel algorithm CBEM in case of its realization on 1, 2, 4 and 8 processors, correspondingly. The good effectiveness proves the fact that during programming the extent of parallelism of the algorithms hasn’t been disturbed. The carried out analysis of realization of the parallel algorithm for solution of steady and unsteady problems allows concluding that on a cluster of usual PCs it is possible to achieve considerable acceleration for real problems with effectiveness exceeding 90% even with the net infrastructure Fast Ethernet. The algorithm realizing CBEM cab be easily parallelized and has the average extent of parallelism close to the ideal one.

References 1. Afanasiev KE, Goudov AM (2001) Information technologies in numerical calculations. Kemerovo State University, Kemerovo (in Russian) 2. Afanasiev KE, Samoilova TI (1995) Comput Techn 7(11):19–37 (in Russian) 3. Afanasiev KE, Stukolov SV (2001) CBEM for solution of plane problems on hydrodynamics and its realization on parallel computors. Kemerovo State University, Kemerovo (in Russian) 4. Afanasiev KE, Stukolov SV (2002) Solution of non-linear problems in the hydrodynamics of an ideal fluid with free boundaries by the method of boundary elements. In: International Conference on High Speed Hydrodynamics, Cheboksary 5. Kiselev OM, Kotlyar LM (1978) Non-linear problems in the theory of jet streams of heavy fluid. Kazan State University, Kazan (in Russian) 6. Moiseev NN (1957) J Prikl Mat Mekh 21(6):860–864 (in Russian) 7. Guzevski LG (1982) Flow over obstructions by heavy fluid with a finite depth. In: Dynamics of continuous media with separating boundaries. Chuvash State University Press, Cheboksary (in Russian) 8. Lavrentiev MA (1975) Prikl Mekh Tekh Fiz 5:3–46 (in Russian)

Simulation of problems with free surfaces by a boundary element method

333

9. Plotnikov PI (1991) Izv Akad Nauk SSSR, Ser Mat 55(2):339–366 (in Russian) 10. Bona JL, Chen M, Saut J-C (2002) J Nonlinear Sci 12:283–318 11. Belotserkovski SM, Kotovski VN, Nisht MI, Fedorov RM (1988) Mathematical modeling of plane parallel interrupted flow over bodies. Nauka, Moscow (in Russian) 12. Gorelov DN (1992) Izv Akad Nauk, Mekh Zhidk Gaza 4:173–177 (in Russian) 13. Gorelov DN (2002) J Appl Mech Tech Phys 1:37–42 (in Russian) 14. Terentiev AG, Kartuzova TV (1995) Numerical investigations of a system of airfoils by the boundary element method. In: Actual problems on mathematics and mechanics. Chuvash State University Press, Cheboksary (in Russian) 15. Yasko NN (1995) Izv Akad Nauk, Mekh Zhidk Gaza 4:100–107 (in Russian) 16. Mokry M (1990) AIAA J 127:1–11 17. Cebeci T, Besnard E, Chen HH (1998) Comput Fluids 27(5–6):651–661 18. Hwang WS (1982) Comput Methods Appl Mech Eng 190:1679–1688 19. Bal S, Kinnas SA (2002) Comput Mech 28:260–274 20. Dragos L, Dinu A (1995) Comput Methods Appl Mech Eng. 121:163–176 21. Maklakov DV (1995) Izv Akad Nauk, Mekh Zhidk Gaza 2:108–117 (in Russian) 22. Maklakov DV (1990) Existence of solution of the problem of pre-critical flow over a vortex. In: Some applications of the functional analysis to problems of mathematical physics. Institute of Mathematics SB AS USSR, Novosibirsk (in Russian) 23. Gorlov SI (1999) Prikl Mekh Tekh Fiz 40(6):25–34 (in Russian) 24. Zhitnikov VP, Sherykhalina NM, Sherykhalin OI (2000) Prikl Mekh Tekh Fiz 41(1):120–129 (in Russian) 25. Afanasiev KE (1997) Solution of non-linear problems on hydrodynamics of the ideal fluid with free boundaries by the methods of finite and boundary elements. Thesis for the degree of Doctor in Physics and Mathematics. Kemerovo (in Russian) 26. Afanasiev KE, Stukolov SV (1999) Comput Techn 4(3):3–16 (in Russian) 27. Stukolov SV (1999) Numerical modeling of solitary steady waves on the surface of a fluid with finite depth. In: Mathematical problems on mechanics of continuous media. Institute of Hydrodynamics SB RAS, Novosibirsk (in Russian) 28. Sturova IV (1990) Numerical calculations in the problems of generation of plane surface waves. Preprint 12. Computing Center SB AS USSR, Krasnoyarsk (in Russian) 29. Afanasiev KE (1996) Approximation of a non-linear solitary wave. In: VI Scientific School High Speed Hydrodynamics. Chuvash State University, Cheboksary (in Russian) 30. Stukolov SV (1999) Solution of non-linear wave problems on hydrodynamics of the ideal fluid by the complex boundary element method. Thesis for the degree of Candidate in Physics and Mathematics. Kemerovo (in Russian) 31. Afanasiev KE, Stukolov SV (2000) Appl Mech Tech Phys 41(3):470–478 32. Korotkov GG (2000) Numerical investigation of flow over obstructions by a vortical fluid with free boundaries. In: Works of N.I. Lobachevski Mathematical Center, Kazan 7:164–168 (in Russian) 33. Afanasiev KE, Stukolov SV (1996) Modeling of capsizing waves by the complex boundary element method. In: VI Scientific School High Speed Hydrodynamics. Chuvash State University, Cheboksary (in Russian) 34. Grosenbaugh M, Yeung R (1989) Fluid Mech 209:57–75

334

K.E. Afanasiev and S.V. Stukolov

35. Suzuki K (1989) Calculation of nonlinear water waves around d2-dimensional body in uniform flow by means of boundary element method. In: Fifth International Conference on Numerical Ship Hydrodynamics, Part 1 36. Afanasiev K.E. (1998) Comput Techn 3(1):3–11 (in Russian) 37. Peregrine D (1987) New Developments in Foreign Science. Moscow: Mir. 42:37– 71 38. Bukreev VI, Turanov NP (1996) Appl Mech Tech Phys 37(6):44–50 39. Manoilin SV (1989) Some experimental theoretical methods for determination of influence of tsunami waves on hydrotechnical constructions and defined water areas in sea ports. Preprint 5. Computing Center SB AS USSR. Krasnoyarsk (in Russian) 40. Synolakis CE (1987) J Fluid Mech 185:523–545 41. Frank AM (1989) Prikl Mekh Tekh Fiz 3:95–101 (in Russian) 42. Frank AM (1993) Prikl Mekh Tekh Fiz 5:15–24 (in Russian) 43. Protopopov BE, Sturova IV (1989) Prikl Mekh Tekh Fiz 1:125–133 (in Russian) 44. Protopopov BE (1990) Izv Akad Nauk, Mekh Zhidk Gaza 5:115–123 (in Russian) 45. Ruziev RA, Khakimzyanov GS (1992) Comput Techn, Institute of Computational Technologies SB RAS. 1(1):5–21 (in Russian) 46. Seabra-Santos FJ, Renouard DP, Temperville AM (1987) Fluid Mech 176:117– 134 47. Shokin YI, Ruziev RA, Khakimzyanov GS (1990) Numerical modeling of plane potential streams of fluid with surface waves. Preprint 12. Computing Center SB AS USSR. Krasnoyarsk (in Russian) 48. Goda Y, Morinobu K (1998) Coastal Eng 40(4):307–326 49. Landrini M, Tyvand PA (2001) J Eng Math 39:131–170 50. Yasuda T, Mutsuda H, Mizutani N, Matsuda H (1999) Coastal Eng 41(3&4):269–280 51. Kawasaki K (1999) Coastal Eng 41(3&4):201–223 52. Kim JW, Bai KJ, Ertekin RC, Webster WC (2002) A strongly-nonlinear model for water waves in water of variable depth - the irrotational Green-Naghdi model. In: 21 International Conference on Offshore Mechanics and Offshore Engineering. Oslo, Norway 53. Gotoh H, Sakai T (1999) Coastal Eng 41(3&4):303–326 54. Guler I, Behr M, Tezduyar T (1999) Comp Mech 23:117–123 55. Tulin MP (2002) Future directions in the study of non-conservative water wave systems. In: 21 International Conference on Offshore Mechanics and Offshore Engineering. Oslo, Norway 56. Afanasiev KE, Stukolov SV (1998) Moving of a solitary wave onto an inclined shore. In: Bulletin of the Omsk University 3:9–12 (in Russian) 57. Afanasiev KE, Stukolov SV (1999) Appl Mech Tech Phys 40(1):27–35 58. Cooker MJ, Peregrine DH, Vidal C, Dold JW (1990) Fluid Mech 215:1–22 59. Sidyakin EV, Stukolov SV (2000) Calculation of loads of surf waves onto hydrotechnical constructions. Kemerovo State University Bulletin, Mathematics series 4:134–140 (in Russian) 60. Khakimzyanov GS, Shokin YuI, Barakhnin VB, Shokina NYu (2001) Numerical modelling of fluid flows with surface waves. Publishing House of SB RAS, Novosibirsk (in Russian) 61. Park J-C, Kim M-H, Miyata H (2001) J Mar Sci Tech 6:70–82

Simulation of problems with free surfaces by a boundary element method

335

62. Afanasiev KE, Korotkov GG (2002) Evolution of a semi-circular cavity on the free surface in the plane and axis-symmetrical cases. In: International Conference on High Speed Hydrodynamics. Cheboksary 63. Korotkov GG, Stukolov SV (1998) Package of applied programs for wave problems on hydrodynamics. In: Scientific School-Conference on Mathematical Modelling: Geometry and Algebra. Kazan 64. Braess H, Wriggers P (2000) Comput Methods Appl Mech Eng 190:95–109 65. Elmore PA, Chahine GL, Oguz HN (2001) Exp Fluids 31:664–673 66. Ogus HN, Prosperetti A (1990) Fluid Mech 219:143–179 67. Tuck EO (2000) Numerical solution for unsteady two-dimensional free-surface flows. In: 11th Biennual Compational Techniques and Applications Conference, World Scientific 68. Best JP, Blake JR (1994) Fluid Mech 261:75–93 69. Blake JR (1988) J Aust Math Soc Ser B30:127–146 70. Plesset MS, Chapman RB (1972) Fluid Mech 47(II):283–290 71. Terentiev AG, Afanasiev KE (1987) Numerical methods in hydrodynamics. Educational manual. Cheboksary: Chuvash State University (in Russian) 72. Voinov OV (1979) Prikl Mekh Tekh Fiz 3:94–99 (in Russian) 73. Blake JR, Gibson DC (1981) Fluid Mech 111:123–140 74. Chahine GL (1979) Trans ASME: Ser.I, J Fluids Engng 99:706–716 75. Cole R (1950) Underwater explosions. Inostrannaya Literatura, Moscow 76. Blake JR, Gibson DC (1987) Fluid Mech 19:99–123 77. Cerone R, Blake JR (1984) J Austral Math Soc. Ser.B 26:31–44 78. Domermuth GD, Yue DKP (1987) Fluid Mech 178:195–219 79. Kedrinskij VK (1978) Surface effects at an underwater explosion (review). J Appl Mech Tech Phys 5 80. Mitchell TM, Hammit FG (1973) Trans ASME J Basic Engng I95 29 81. Plesset MS, Prosperetti A (1977) Annu Rev Fluid Mech 9:145–185 82. Voinov VV, Voinov OV (1975) Prikl Mekh Tekh Fiz 1:89–95 (in Russian) 83. Voinov VV, Voinov OV (1976) Papers of AS USSR 227:63–66 (in Russian) 84. Shima A, Nakajama K (1977) Fluid Mech 80(2):369–391 85. Benjamin TB, Ellis AT (1966) Phil Trans R Soc Lon A 260:221–240 86. Blake JR, Robinson PB, Shima A, Tomita Y (1993) Fluid Mech 255:707–721 87. Lauterborn W, Bolle H (1975) Fluid Mech 72:391–399 88. Ligneul P (1987) Phys Fluids 30(7):2280–2283 89. Shima A, Sato Y (1979) Ing-Arch 48(2):85–95 90. Shima A, Takajama K, Tomita Y, Ohsawa N (1983) AIAA J 21:55–59 91. Shima A, Tomita Y, Gibson DC, Blake JR (1989) Fluid Mech 203:199–214 92. Tomita Y, Sato K, Shima A (1994) Bubble Dyn Inter Phen:33–45 93. Best JP, Kucera A (1992) Fluid Mech 245:137–154 94. Shopov PJ, Minev PD (1992) Fluid Mech 235:123 –141 95. Blake JR, Taib BB, Doherty G (1986) Fluid Mech 170:479–498 96. Blake JR, Taib BB, Doherty G (1987) Fluid Mech 181:197–212 97. Terentiev AG, Afanasiev KE, Afanasieva MM (1989) Simulation of unsteady free surface flow problems by the direct boundary method. In: Advanced boundary element methods IUTAM Symposium, Springer-Verlag 98. Levkovski YL (1978) Structure of cavitation streams. Sudostroenie, Leningrad (in Russian) 99. Prosperetti A (1977) Meccanica 12(4):214–235

336

K.E. Afanasiev and S.V. Stukolov

100. Goudov AM (1995) Numerical modeling of interaction of a bubble with different types of boundaries in a fluid. Thesis for the degree of Candidate in Physics and Mathematics. Kemerovo (in Russian) 101. Goudov AM (1997) Computat Techn 2(4):49–59 (in Russian) 102. Afanasiev KE, Afanasieva MM, Terentiev AG (1989) Deformation of gas bubbles in fluid. In: Actual problems on hydrodynamics. Chuvash State University, Cheboksary (in Russian) 103. Afanasiev KE, Goudov AM (1996) Evolution of a chain of three bubbles in the infinite fluid. In: Dynamics of continuous media with free boundaries. Cheboksary: Chuvash State University (in Russian) 104. Kedrinskij VK (2000) Hydrodynamics of explosion. Experiment and models. SB RAS, Novosibirsk (in Russian) 105. Wu C, Hwang Ned HC, Lin YK (2001) Cardiovascular Eng: Int J 1(4):171–176 106. Chahine G, Duraiswami R, Rebut M (1992) Analytical and numerical study of large bubble / bubble and bubble / flow interactions. In: Bubble dynamics and interface phenomena, Kluwer Academic Publihers 107. Chahine GL (1994) Strong interactions bubble/bubble and bubble/flow. In: Bubble dynamics and interface phenomena, Kluwer Academic Publihers 108. Wang QX (1998) Theoret Comput Fluid Dynamics 12:29–51 109. Xi W-Q, Kwee P-E, Hock T-B (1998) Numerical Simulation of Evolution of Three-Dimensional Bubble. In: 22nd Symposium on Naval Hydrodynamics. Washington, D.C. Preprints 110. Lavrentiev MA, Shabat BV (1977) Problems of hydrodynamics and their mathematical models. Science, Moscow (in Russian) 111. Afanasiev KE, Grigorieva IV (2000) Investigation of the evolution of a spatial cavitation bubble near a solid wall in the ideal incompressible fluid in presence of surface tension. In: Works of N.I. Lobachevski Mathematical Center, Kazan 7:38–42 (in Russian) 112. Afanasiev KE, Grigorieva IV (2001) Interaction of a spatial gas-steam bubble with solid walls in the ideal incompressible fluid in presence of surface tension. In: Scientific Practical Conference “Informational Resources of Kuzbass”, Kemerovo 113. Afanasiev KE, Grigorieva IV (2002) The Investigation of buoyant gas bubble dynamics near an inclined wall. In: International Conference on High Speed Hydrodynamics. Cheboksary (in Russian) 114. Grigorieva IV (2000) Investigation of the evolution of a spatial gas bubble in the ideal incompressible fluid. In: Kemerovo State Univercity Bulletin, Mathematics series. Kemerovo 4:123–128 115. Grigorieva IV (2000) Peculiarities of numerical solution of spatial problems on dynamics of the ideal incompressible fluid by the boundary element method. In: International Scientific Conference “Modeling, Calculations, Designing in Conditions of Vagueness – 2000”. Ufa 116. Knapp R, Daily G, Hammitt F (1974) Cavitation. Mir, Moscow 117. Lamb G (1947) Hydrodynamics. Gostechizdat, Moscow 118. Pritchett GW (1974) Calculations of phenomena during underwater explosions in conditions of incompressibility. In: Underwater and underground explosions. Moscow: Mir 119. Chubarov LB (2000) Numerical modeling of tsunami waves. Thesis for the scientific degree of Doctor in Physics and Mathematics. Novosibirsk (in Russian)

Simulation of problems with free surfaces by a boundary element method

337

120. Afanasiev KE, Dolaev RR (2002) Program complex “AKORD” for support of calculating experiments. In: International Conference on High Speed Hydrodynamics. Cheboksary 121. Afanasiev KE, Goudov AM, Grigorieva IV, Korotkov GG, Dolaev RR, Berezin EN (2000) Integrated system for support of numerical experiments “AKORD”. In: Kemerovo State Univercity Bulletin, Mathematics series. Kemerovo 4:82–91 (in Russian) 122. Afanasiev KE, Korotkov GG, Dolaev RR (2000) Comput Techn 5(1):5–18 123. Afanasiev KE, Goudov AM, Korotkov GG, Dolaev RR, Berezin EN (2000) Shared package of application programs “AKORD” for support of calculating experiments. In: International Scientific Conference “Modeling, Calculations, Designing in Conditions of Vagueness – 2000”. Ufa (in Russian) 124. Grigorieva IV, Goudov AM (1999) Comput Techn 4(6):68–76 (in Russian) 125. Voevodin VV, Voevodin VlV (2002) Parallel calculations. BHV-Petersburg, St-Petersburg (in Russian) 126. Afanasiev KE, Stukolov SV (2003) Electronic educational methodical complex “Multiprocessor computing systems and parallel programming”. In: Scientific Methodical Conference “Telematics–2003”. St-Petersburg 127. Afanasiev KE, Stukolov SV, Demidov AV, Malyshenko VV (2003) Multiprocessor computing systems and parallel programming. Kemerovo State University. Kuzbassvuzizdat, Kemerovo (in Russian)

Simulation and optimisation for hydro power E. G¨ode Institute for Fluid Mechanics and Hydraulic Machinery, University of Stuttgart, Pfaffenwaldring 10, 70550 Stuttgart, Germany [email protected]

Summary. The paper is an overview of the united efforts of Institute of Computational Technologies of SB RAS (Novosibirsk, Russia), Aston University (Birmingham, United Kingdom) and Institute of Automation and Electrometry of SB RAS (Novosibirsk, Russia) in the field of mathematical modeling of dispersion-managed (DM) solitons in transmission fiber lines. The most widely used mathematical models of dispersion-managed solitons as well as corresponding numerical techniques are discussed. Some results of numerical simulation for a number of important practical dispersion maps are presented.

1 Introduction Hydro Power is the most important renewable energy source on earth. The fuel “Water” is free of charge and with the generation of electric energy in a hydroelectric Power Station the production of green house gases (mainly CO2) is negligible. Hydro Power Generating Stations are long term installations and can be used for 50 years and more. However, care must be taken to guarantee a smooth and safe operation over the years. Maintenance is necessary and critical parts of the machines have to be replaced if necessary. Within modern engineering the numerical flow simulation plays an important role in order to optimise the hydraulic turbines in conjunction with the connected components of the plant. Especially for rehabilitation and upgrading of existing Power Plants important points of concern are: to to to to

predict the power output of the turbine; achieve maximum hydraulic efficiency; avoid or to minimize cavitation; avoid or to minimize vibrations.

in the whole range of operation. Vibrations are not only caused by interaction of the steady and the rotating parts of the machines. Depending on special operating condi-tions vortices can be generated in the through flow leading to

340

E. G¨ ode

pressure pulsations. Flow simulation can help to solve operational problems and to optimise the turbo-machinery for hydro electric generating stations or their components through: Intuitive optimisation; Mathematical optimisation; Parametric Design; Reduction of cavitation through modern design; Prediction of the draft tube vortex; Trouble shooting by use of simulation. Some examples are given in the following.

2 Draft tube vortex One important component of a hydraulic turbine is the draft tube which is in fact a diffuser. The draft tube is used to recuperate the kinetic energy down stream of the turbine runner. The kinetic energy at runner outlet can be for lo head generating sta-tions as high as 50 % or more as a fraction of the total hydraulic energy. A well per-forming draft tube is therefore essential to achieve high efficiencies in a hydro electric power plant. However, at off-design operating conditions the fluid flow in the draft tube contains not only a through flow component equivalent to the discharge but also a swirl com-ponent. This type of flow can roll up to a vortex that rotates with a fraction of the rota-tional speed of the runner. Normally the rotational speed of the vortex lies between 30 % and 50 % of the runner speed. The corresponding pressure field rotates with the same speed leading to noise and vibration for the adjacent walls of the building. In order to simulate the draft tube vortex in a realistic way the standard turbulence mod-elling is not sufficient. Because of the turbulence in the vortical flow field is not iso-tropic,

Fig. 1.

Simulation and optimisation for hydro power

341

Fig. 2.

isotropic turbulence models such as the standard k-e-model are not able to resolve the vortex. Appropriate modelling is required [1].

3 Trouble shooting for a 69 MW hydro electric power station Marsyangdi, the largest hydro electric power station in Nepal, had since erection one decade ago severe power swing on two of three 23 MW FrancisTurbines, caused by vortices in the Trifurcation on pressure side (which was not known before): A solution was found through numerical flow simulation for the existing Trifurcation as well as for the modification, in cooperation with ASTR, Graz, Austria, performing model tests accordingly. After modification of the Trifurcation, the power plant is now operating smoothly, in addition the output of the plant was increased in total by 5 % [2].

4 Flow separation at turbine intake At T¨ ubingen, Germany, a run-of-river Power Station with two identical BulbTurbines showed operational problems. After a couple of years of operation

342

E. G¨ ode

Fig. 3.

Fig. 4.

one of the turbines was shut down due to a fatal seal damage, caused by unsteady vortices generated at intake. Solution was found through numerical flow simulation for the existing Intake as well as for the modification of the intake contour [3]. Delicate was that first the numerical simulation was carried for a steady state solu-tion. However, convergence could not be achieved, even for different computational meshes. When switching to unsteady mode, it turned out that flow separation at in-take leading edge (graphic below) generated vortices running into one turbine chan-nel (graphic above). After modification of the intake the flow separation at inlet edge was avoided, both turbines now operate smoothly.

5 Modernisation of existing power plants The modernization of existing hydro electric power plants is a very attractive alterna-tive in order to develop the already existing hydro potential in our countries. While the realization of new generating stations creates opposition

Simulation and optimisation for hydro power

343

Fig. 5.

in the society, it can be said that in many of the old and aging power plants considerable potential does exist to increase power output. Through replacement of important turbine components, in most cases the wicket gate and the runner, often more than one of the following im-provements can be realized: increase of hydraulic efficiency of the turbine; increase of turbine performance through more capacity; reduction of cavitation; reduction of noise and vibration. A Small Hydro Power Plant consisting of four Francis turbines with an existing total capacity of 1200 kW at 7.9 m rated head has been upgraded. The 80 years old power plant Kiebingen is located in the south western part of Germany and is run by the EnBW (Energy Baden W¨ urttemberg). As with most of the old power plants, Fran-cis turbines were used for the initial design, and the construction was carried out without spiral casing. The maximum power output per turbine was roughly 300 kW instead of 340 kW according to the guarantees given by the supplier. In recent years EnBW decided to find out the reason for the short-coming of the exist-ing Turbines. A initiative was started including CFD-simulations in order to discover the potential for an upgrade. Then, a new Propeller runner was developed to replace the existing Francis runners. In addition, the wicket gate was increased. The result was an improvement of the turbine performance at full load from 300 kW to 400 kW per turbine for rated head. The total energy production will be increased from some 6,5 GWh/a to at least 8 GWh/a [4].

344

E. G¨ ode

Fig. 6.

6 Parametric turbine design in a VR environment Usually hydraulic turbines have to be designed individually according to the local op-erating conditions of the power station such as discharge, head and given geometri-cal situations. This requires a tailor-made design mainly for the turbine runner. The shape of the runner is rather complicated and the understanding of the complex ge-ometry and especially of the spatial structures of the flow are sometimes very difficult. The initial design is based on Euler equation for turbomachinery. By use of this equation it is possible to brake down the runner shape into a set of design parame-ters such as head, discharge, speed, number of blades, thickness of the profiles and so on. Since Euler equation is relatively simple and the corresponding algorithm to generate the runner shape as well (if done right), the ,parametric runner design’ can be made in real time. Then, fine tuning of the geometry is carried out using CFD (computational fluid dynamics), in fact with flow simulation in order to minimize cavita-tion and maximize hydraulic efficiency as well [5]. This is to a great extend an intuitive optimisation process which some times requires a designer having a lot of different skills and experience respectively.

Simulation and optimisation for hydro power

345

Fig. 7.

7 Mathematical geometry optimisation For a given task the intuitive design optimisation process usually requires a lot of ex-perience and is in addition often time consuming. If possible, a mathematical optimi-zation procedure can be very useful. Given that the contour to be optimised is defined on the basis of a set of parameters, the parameters can be changed automatically following a suitable strategy. To decide, which change was right or wrong, a quality function must be specified to answer the question: “What is good?”. For a simplified draft tube (only circular cross sections) an automatic optimisation procedure was developed allowing to change 6 cross sections (parameters). After each change of a parameter a new draft tube contour was generated and a new computational grid accordingly. Then, the following flow simulation was evaluated whether the pressure recovery was higher than before. Using the strategy EXTREM it was possible to find a draft tube shape with a considerable higher pressure recovery than that for the start configuration. More important, the area distribution from inlet to outlet of the optimised draft tube is according to the state-of-the-art design rule for this important

346

E. G¨ ode

Fig. 8.

turbine component. In the straight parts of the diffuser the fluid should be decelerated as much as possible, whereas within the bend it should not be decelerated or sometimes slightly be accel-erated [6].

References 1. Ruprecht A, Helmrich T, Aschenbrenner T, Scherer T (2002) Simulation of vortex rope in a turbine draft tube. 21th IAHR Symposium on Hydraulic Machinery and Systems, Lausanne 2. Ruprecht A, Neubauer R, Helmrich Th (2003) Simulation of vortex instability in a pipe trifurcation. ASME/JSME Joint Fluids Engineering Conference, Symposium on Applications in Computational Fluid Dynamics, Hawaii 3. Ruprecht A, G¨ ode E (1997) Wirbelabl¨ osung am Einlauf eines Kraftwerks. VisEng 97, Stuttgart 4. G¨ ode E, Ittel G (2003) Upgrading of a low head small hydro power plant by replacing Francis runners with fixed bladed propellers. WaterPower XIII, Buffalo NY 5. G¨ ode E, Kaps A, Ruprecht A, Woessner U (1999) Hydro turbine design in a VR environment. ISIMADE’99, Baden-Baden 6. Eisinger R, Ruprecht A (2002) Automatic shape optimization of hydro turbine components based on CFD. Seminar CFD for turbomachinery applications, Gdansk, 2001. In: TASK QUATERLY 6(1):101–111

The analysis of behaviour of multilayered conic shells on the basis of nonclassical models V.V. Gorshkov Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia [email protected]

Summary. The parametrical analysis of stressed-deformed state of multilayered reinforced conic shell on a basis of vectorially linear and nonlinear variants of classical and nonclassical theories is done. The influence of structure of reinforcement of a composite material, the cross shift of binding and the order of arrangement of the reinforced layers on the behaviour of shell are investigated. The comparison of the numerical solutions obtained by the methods of spline collocation and discrete orthogonalization is conducted among themselves and with the numerical results obtained by the method of invariant immersing. High efficiency of the used numerical methods is shown on the example of the solution of boundary-value problem for stiff systems of the differential equations.

1 Introduction Thin-walled shells are the major elements of many modern designs, occupying the leading part in aircraft building, shipbuilding, mechanical engineering, petroleum, gas and chemical industries. The opportunities of use of shells have considerably extended with the occurrence of composite materials (CM). Due to their lightness, strength, rigidity, CM essentially surpass the traditional metals and alloys in specific characteristics. Having an opportunity of change of the internal structure, CM open the great opportunities for management of stressed-deformed state (SDS) of constructions, thus providing the best conditions of their work. The increase of requirements to strength and reliability of modern constructions results in the necessity of consideration of the vectorially nonlinear and nonclassical theories of shells alongside with the classical linear theory. The systems of equations describing the behaviour of shells are stiff, and the solutions have strong fringe effects. At the numerical calculation of such equations there are the difficulties connected to the instability of the calculation. Therefore, the important problems are the choice of numerical methods and the reliability of the obtained numerical solutions.

348

V.V. Gorshkov

2 Formulation of the problem The multilayer conic shell of constant thickness h, made of fibrous composite material is considered; 2α is a corner of a resolution of a cone; a, b are the coordinates of the left and right edges of the shell. For the description of elastic properties of the reinforced layer the structural model of CM with bidimentional fibres is used [1]. The research of SDS of a conic shell is carried out on the basis of linear and nonlinear variants of the classical Kirchhoff–Love theory [2], Timoshenko theory [3] and AndreevNemirovskii theory [4]. The resolving system of the equations, describing SDS of a conic shell has the form: dy(ξ) = A(ξ, y(ξ)) + b(ξ), ξ ∈ [0, 1], dξ (1) G0 y(0) = g0 ,

G1 y(1) = g1 .

Here y(ξ) is the vector of resolving functions, ξ = s/b, s ∈ [a, b]. The system (1) is nonlinear, it has the 8-th order in case of Andreev-Nemirovskii theory [4] and the 6-th order in the cases of the theories of Kirchkhoff–Love and Timoshenko. The behaviour of a conic shell depending on structural and mechanical parameters of CM, the used geometrical theory, the arrangement of the reinforced layers is investigated. The numerical solution of the boundary-value problem (1) is obtained by the methods of spline collocation [5] and discrete orthogonalization [6].

3 The analysis of efficiency of numerical methods The property of stiffness of the system (1) is especially brightly expressed in the case of Andreev-Nemirovskii theory [4]. It is investigated using a cylindrical shell (α = 0). If the longitudinal generalized effort is known, for instance, from boundary conditions, then the right part of the system (1) can be resulted in a linear kind A(ξ, y(ξ)) = A(ξ)y(ξ). The eigen-values of the matrix A(ξ) have the form:  μ2 + ν 2  q, q  1, λ1,2,3,4 = ±μ ± iν, λ5,6 = ±q, here μ, ν are the real and imaginary parts of eigen-values; ±q are the real eigenvalues. The presence of the real eigen-values in a nonclassical case results in the occurrence of the exponential functions eq(ξ−1) , e−qξ in the solution alongside with the functions eμ(ξ−1) cos(νξ), eμ(ξ−1) sin(νξ), e−μξ cos(νξ), e−μξ sin(νξ). The values of these exponential functions are essential small near the vicinities of edges ξ = 1, ξ = 0 and quickly decrease at distance from them (Fig. 1a). The occurrence of the strongly expressed fringe effect in the solution is caused by the presence of such functions. On the other hand, the matrix of

The analysis of behaviour conic shells

(a)

349

(b)

Fig. 1. (a) The values of exponential functions ; (b) the number of conditionality Λ∗ = max Λ(ξ) of the matrices A(ξ) for the three-layer reinforced cylindrical shell ξ

coefficients of system of the equations is badly conditioned. The eigen-values for three-layer cylindrical shell with rigid covers and with various homogeneous layers are submitted in Table 1, Ec1 = Ec3 = 30Ec2 . Here Ecn is Young’s modulus of a material of the n-th layer. From Table 1 it follows, that the real eigen-values is much greater than not the unity only, but also than the modules of the complex eigen-values. It results in that the number of conditionality of the matrix A(ξ) becomes much greater than the unity. Table 1. The eigen-values for three-layer cylindrical shell with rigid covers and with various homogeneous layers Eigen-values of a matrix A 139.0

278.3

50

30

20

10

R/h q

417.6

696.1

100

200

1392.2

2784.4

μ

7.6

10.1

12.1

15.3

21.4

29.9

ν

5.3

8.4

10.7

14.3

20.6

29.5

In Fig. 1b the dependence of the number of conditionality Λ∗ = max Λ(ξ) ξ

of the matrices A(ξ) for the three-layer reinforced cylindrical shell on the parameters γ = R/h and Ω = Ea1 /Ec1 is represented. R is the radius of a cylindrical shell, Ea1 is Young’s modulus of a material of reinforcement. From the figure it is clear that the number of conditionality in two order is more

350

V.V. Gorshkov

than the unity. Therefore, the thinner the shell the greater the number of conditionality. For the multilayered cylindrical shell with constant structural parameters it is possible to obtain the analytical solution of the system (1) [7]. Table 2. The results obtained by the methods of spline collocation (package COLSYS) and discrete orthogonalization (package GMDO) Relative error ε by components W

W

10−4

7.51 · 10−6

1.35 · 10−7

1.39 · 10−5

7.62 · 10−7

10−8

3.22 · 10−9

1.72 · 10−10

5.72 · 10−9

8.30 · 10−10

Π

S1

COLSYS package

T OL

GMDO package

J 600

−5

6.32 · 10

1.12 · 10−6

1.16 · 10−4

6.39 · 10−6

1200

4.56 · 10−6

8.11 · 10−8

8.39 · 10−6

4.62 · 10−7

In Table 2 the comparison of the results, obtained by the methods of spline collocation (package COLSYS) and discrete orthogonalization (package GMDO), with the analytical solution for a three-layer cylindrical shell with homogeneous layers is presented. Here ε is the relative error in the uniform metrics; W, Π are the dimensionless deflection and the kinematic characteristic, which is taking into account the presence of the cross shifts; S1 is the dimensionless generalized effort; T OL is the accuracy, set in the package COLSYS; J is the total number of the grid elements in the integrating procedure for the method of discrete orthogonalization. From Table refPogreshnost it follows that the numerical solutions practically coincide with the analytical ones. It allows to make the conclusions about the high efficiency of the used numerical methods. As the additional experiment SDS of a cylindrical shell is calculated, the value of parameter is R/h = 200. The spectral radius of the matrix of the system is λ = 2784.4. Both methods successfully calculate the solution of this problem for the calculation parameters T OL = 10−8 for the package COLSYS and J = 4000 for the method of discrete orthogonalization. For instance, the maximal relative errors for the function Π are equal 1.52 · 10−8 and 1.26 · 10−5 for the packages COLSYS and GMDO accordingly.

4 Calculation of SDS of a conic shell The three-layer conic shell with homogeneous layers. We consider a three-layer rigidly jammed conic shell, consisting of the homogeneous layers, subjected to constant internal pressure.

The analysis of behaviour conic shells

351

In Fig. 2 the maximal dimensionless deflections, axial (continuous lines) and circle (shaped lines) stresses depending on the parameter Ω = Ec1 /Ec2 are shown. Hereinafter, the curve 1 corresponds to the values calculated using the linear classical theory, the curve 2 – the linear theory of Timoshenko, the curve 3 – the linear theory of Andreev-Nemirovskii [4]. The calculations are carried out for α = 30o , b/h = 20, h1 = h3 = 0.1h. From Fig. 2 it is clear that the results for the classical theory and Timoshenko theory are close. The distinction between the results obtained using the classical and nonclassical theories is essential. The relative difference at Ω = 100 for deflections is about 25%, and for axial stresses – about 60%, for circle stresses – no more than 5%. The two-layer reinforced conic shell. The two-layer reinforced rigidly jammed conic shell is considered. The internal layer is reinforced of longitudinal family of reinforcement, the external internal – circle family, α = 30o , b/h = 20, a/b = 0.2. Table 3. Maximal deflections of two-layer reinforced rigidly jammed conic shell Ea /Ec

W max 10−2 (COLSYS)

W max 10−2

W max 10−2

(orthogonalization method) (from tab. 8.2.1) [4]

1

0.72497135298

0.72497145742

0.724

10

0.36316847943

0.36316846987

0.364

30

0.18973283292

0.18973282987

0.190

50

0.13064281112

0.13064281012

0.132

In Table 3 the calculation results of maximal deflections received by the methods of spline collocation and discrete orthogonalization are presented.

Fig. 2. The maximal dimensionless deflections, axial (continuous lines) and circle (shaped lines) stresses depending on the parameter Ω = Ec1 /Ec2

352

V.V. Gorshkov

The results, obtained by the method of invariant immersing, are given in the third column [4]. From Table 3 the good coincidence of results can be seen, that allows to draw a conclusion about the reliability of the obtained numerical solutions. The three-layer reinforced conic shell. The influences of the structure of reinforcing and a choice of the geometrical theory on the behaviour of the three-layer rigidly jammed conic shell subjected to constant internal pressure is investigated. The internal layer of the shell of thickness h1 is reinforced of longitudinal family of reinforcement, the average layer of thickness h2 – by circle family, and the external layer of thickness h3 – by spiral families of reinforcement under corners ψ and −ψ; h1 = h3 = 0.1h. the geometrical parameters of the shell correspond to the values for Table 3. The maximal reduced stresses in binding bs0 , longitudinal bs1 and spiral bs3 families of reinforcement and the dimensionless deflections of the coalplastic conic shell depending on a corner of spiral reinforcing ψ are shown in Fig. 3.

Fig. 3. The maximal reduced stresses in binding bs0 , longitudinal bs1 and spiral bs3 families of reinforcement and the dimensionless deflections of the coal-plastic conic shell depending on a corner of spiral reinforcing ψ

The analysis of behaviour conic shells

353

From Fig. 3 it is clear that the distinction between the results obtained using the classical theory and Timoshenko theory reaches 40% for binding at ψ = 70o and 30% for spiral reinforcement at ψ = 10o . For deflections and stresses, in longitudinal reinforcement the distinction does not exceed 5%. The results obtained using the classical and nonclassical theories differ for longitudinal family of reinforcement up to 40%, for spiral family – up to 70%. From Fig. 3 also it follows, that the suitable choice of a corner spiral reinforcings makes possible to lower the stresses in binding in 4 times due to the increase of stresses in spiral family of reinforcement in 2 times.

∗ Fig. 4. The dimensionless intensity of stresses in binding σ0∗ , longitudinal σ(1) and ∗ spiral σ(3) families of reinforcement depending on a ratio between mechanical characteristics of reinforcement and binding

In Fig. 4 the dimensionless intensity of stresses in binding σ0∗ , longitudinal ∗ ∗ and spiral σ(3) families of reinforcement depending on a ratio between σ(1) mechanical characteristics of reinforcement and binding are given. Here Ω = Ea1 /Ec1 , Ea1 = Ea2 = Ea3 . The order of an arrangement and the thickness of layers correspond to the parameters for Fig. 3, ψ = 10o . Here σ0∗ = σ0 /P , ∗ = σ(n) /P , (n = 1, 2, 3), where σ0 , σ(n) are the intensity of stresses in σ(n) binding and the n-th family of reinforcement. The distinction between the results obtained using the classical theory and Timoshenko theory at Ω > 30 reaches 40%. The difference in the results, obtained using the classical and nonclassical theories reaches, for example, at Ω = 60 for stress in binding – 37%, for stresses in longitudinal and spiral reinforcement – 60% and 40% accordingly. The influence of a choice of the geometrical theory on magnitude of deflections is insignificant. In Fig. 5 the dependence of maximal dimensionless deflections and the reduced stresses in binding and spiral reinforcement for coal-plastic shell on a corner of spiral reinforcing and a ratio of thickness of layers is shown. The internal layer is reinforced by spiral family of reinforcement, average – by circle family , external – by longitudinal family, h1 = h1 /h, h1 = h3 . From Fig. 5 it follows that it is possible to reduce the deflections of a shell in 3 times, the stress in spiral reinforcement – in 6 times by the suitable choice of the parameters of reinforcing.

354

V.V. Gorshkov

Fig. 5. The dependence of maximal dimensionless deflections and the reduced stresses in binding and spiral reinforcement for coal-plastic shell on a corner of spiral reinforcing and a ratio of thickness of layers

Fig. 6. The maximal reduced stresses in elements of composite material and the dimensionless deflections of fiberglass plastic conic shell depending on a corner spiral reinforcing and a various arrangement of the reinforced layers

In Fig. 6 the maximal reduced stresses in elements of composite material and the dimensionless deflections of fiberglass plastic conic shell depending on a corner spiral reinforcing and a various arrangement of the reinforced layers are presented. The curve 1 corresponds to the values for a shell, at which the internal layer is reinforced by longitudinal family of reinforcement, average – by circle family, external – by spiral family under corners ψ and −ψ. We denote such structure as (0o , 90o , ψ, −ψ). The curve 2 corresponds to the structure (0o , ψ, −ψ, 90o ), the curve 3 – to (90o , ψ, −ψ, 0o ). From Fig. 6 it is clear that an arrangement of circle reinforcement in the internal layer is the most unfavorable (the curve 3) for binding. If circle reinforcement is placed in the external layer, then the stresses in binding and spiral reinforcement decrease in 1.5 times (the curve 2). Having placed the circle reinforcement in the average layer, it is possible to achieve the reduction of stress in binding up to 2 times, due to the redistribution of stresses and the best perception of loadings by spiral reinforcement. The influence of the arrangement of layers on the deflections is insignificant.

The analysis of behaviour conic shells

355

References 1. Nemirovskii YuV (1972) Mekhanika polimerov 5:861–873 (in Russian) 2. Novozhilov VV (1951) Theory of thin shells. Leningrad: Sudpromgiz (in Russian) 3. Grirorenko YaM, Vasilenko AT (1992) Static problems of anisotropic heterogeneous shells. Moskow: Nauka (in Russian) 4. Andreev AN, Nemirovskii YuV (2001) Multi-layer anisotropic shells and plates. Novosibirsk: Nauka (in Russian) 5. Ascher U, Christiansen J, Russel RD (1981) ACM Trans on Math Software 2:209–222 6. Godunov SK (1961) Usp Mat Nauk 3:171–174 (in Russian) 7. Golushko SK, Gorshkov VV (2002) Comput Techn, Bulletin KazNU 4(32):172– 180 (in Russian)

Simulation of the motion and heating of an irregular plasma N.A. Huber Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia [email protected]

Summary. One of the attractive plasma physics problems is the plasma heating and confinement one. Because of the variety of regimes, the wide range of parameters of the medium, and the complexity and nonlinearity of the examined processes, the problem on plasma heating and propagation is a multiparameter one, which requires the use of various approaches. In the present work the differential model is presented, based on the three-fluid approach, which is designed to perform computational experiments on heating and motion of a dense ionized gas cloud (plasmoid) in a powerful magnetic field. The model takes into account such processes as ionization, heating, heat transmission, and relativist electron beam action. The model includes continuity and energy equations for all plasma components being studied, motion equations for heavy particles, and magnetic-field equations. The action of the electric field is taken into account using the simplifications that allow excluding of relatively rapid electron motions from the calculations, which makes the model more effective. For the numerical solution of the problem, the economical finite-difference scheme is developed, which is based on the factorization method with the splitting into spatial directions and physical processes. The algorithm allows obtaining solutions of gas dynamics and of magnetic induction equations separately. The beam energy transfer is modeled using the experimental data. The calculations of the propagation of a plasmoid heated by a source in an external magnetic field are performed. The mechanism of the effect of the magnetic field and of the heat source on the plasmoid expansion is determined.

1 Formulation of the problem We consider the propagation of a dense plasmoid composed of deuterium and lithium heavy particles (atoms and ions), and electrons, which is heated by an external source in a powerful magnetic field. The plasmoid is assumed to be axisymmetric with the density magnitude several orders higher than the density of the background plasma surrounding the plasmoid. Under the action of hydrodynamic and magnetic pressures, and of the external heat source,

358

N.A. Huber

the plasmoid begins to expand in the background plasma. The flow is considered axisymmetric, and it is simulated as the propagation of the plasmoid in a certain cylindrical volume filled with low-density plasma and located in a magnetic field directed along the z-axis. In a magnetohydrodynamic approximation the system of equations of hyperbolic type for plasma heating and propagation can be written in vector form as follows. Ion equations: ∂nsD + div (nsD Vi ) = 0, ∂t

(1)

∂nsL + div (nsL Vi ) = 0, (2) ∂t   (H∇) H ∇H 2 1 1 ∂Vi , (3) − · + (Vi · ∇) Vi = − · ∇ (pi + pe ) + 8π 4π ρs ρs ∂t   ∂pi ↔ + div (pi Vi ) + (γ − 1) pi divVi = (γ − 1) div κ i · ∇Ti + (γ − 1) Qi . (4) ∂t Here nsD , nsL are deuterium and lithium heavy particles densities correspondingly; Vi — heavy particles velocity; ρs = mD nsD +mL nsL – mass density of heavy particles; H – magnetic field; pi = (nsD + nsL ) kTi – ion pressure; Ti – ion temperature; k – Boltzmann constant; pe = ne kTe ; Te – electron tem1 ne me k (Ti − Te ) – the source of ion heating due to electronperature; Qi = 2 τe M ion collision; ne – electron density; Te – electron temperature; me – electron mass; τe – the time of electron scattering by ions; M – average ion mass; γ = 35 ns kTi τi ↔ – adiabatic exponent; κ i – ion thermal conductivity tensor: κi|| = 3.9 mi 2 2 (ωi τi ) + 2.6 ns kTi τi for for longitudinal conductivity, κi⊥ = 2 mi (ωi τi )4 + 2.7 (ωi τi )2 + 0.677   2 ns kTi τi (ωi τi ) 2.5 (ωi τi ) + 4.65 – for transverse conductivity, κi∧ = 2 mi (ωi τi )4 + 2.7 (ωi τi )2 + 0.677 “oblique” conductivity. Electron equations: ne = ZD nsD fD (Te ) + ZL nsL fL (Te ) ,   ∂pe ↔ +div (pe Ve )+(γ − 1) pe divVe = (γ − 1) div κ e · ∇Te +(γ − 1) Qe , (5) ∂t Ve = Vi −

c rotH. 4πene

(6)

Here fα (Te ) = min (max {0, (Te − 1)/(Tα∗ − 1)} , 1) – ionization laws for ∗ deuterium and lithium; TD = 3eV , TL∗ = 15eV ; pe = (ZD nD + ZL nL ) kTe –

Simulation of the motion and heating of an irregular plasma

359

electron pressure, ne = Zni = ZD nD + ZL nL – electron density; ZD = 1, ZL = 3; nD = nsD fD (Te ). Magnetic field equation:  −1  c2 ∂ ↔ rot σ · rotH − H = rot [Vi × H] − ∂t  4π  ↔ − ec rot χ · ∇kTe − enc e [∇ne × ∇kTe ] + c (rotH · (∇ne · H) − H · (∇ne · rotH)) + + 4πen2e c {(rotH · ∇) H − (H · ∇) rotH} + 4πen e

(7)



Here σ is the tensor of the force of friction between electrons and ions:   me ne & 0.44 Vi|| − Ve|| + τe *  2  −1  5.52 (ωe τe ) + 0.56 ↔ σ j , = en + (Vi⊥ − Ve⊥ ) 1 − e 4 2 (ωe τe ) + 10.8 (ωe τe ) + 1.05 ↔

χ - thermal force tensor: ↔

χ · f = 0.91∇|| f +

2

4.45 (ωe τe ) + 0.95 4

2

(ωe τe ) + 10.8 (ωe τe ) + 1.05   2 (ωe τe ) 1.5 (ωe τe ) + 1.78 [h × f ] , + 4 2 (ωe τe ) + 10.8 (ωe τe ) + 1.05

∇⊥ f +

where h = H/|H|.

2 Numerical Algorithm Using (6) the electron energy equation can be written in the following form:   ∂pe ↔ + (Vi · ∇) pe + γpe divVi = (γ − 1) div κ e · ∇Te + (γ − 1) Qe + ∂t +

1 c c pe ∇ · rotH. ∇Te · rotH + (γ − 1) ne 4πe 4πe

(8)

Thus, the electron velocity is excluded from hydrodynamic equations. The system of hydrodynamic equations (1)–(4), (8) can be written in divergent form: ∂U = − (Wr + Wz ) + R = −W. (9) ∂t In nondivergent form:

360

N.A. Huber

∂f = − (Ωr + Ωz ) f + S. ∂t

(10)

Here

T  U = nsD , nsL , ρi vri , ρi vϕi , ρi vzi , pi , pe , T  f = nsD , nsL , vri , vϕi , vzi , pi , pe ,       Ωr f = A−1 Wr , Ωz f = A−1 Wz , S = A−1 R, A = ∂U ∂f . Wr and Wz contain the terms with derivatives at r and z directions, correspondingly; R contains the magnetic force, the thermal source, and the terms with mixed derivatives. The magnetic induction equations can also be written in divergent and nondivergent forms:   ∂H H = − WH + RH = −WH , r + Wz ∂t

(11)

  H ∂f H = − ΩrH + Ωz f H + SH . ∂t

(12)

H In a similar manner, WH r and Wz contain the terms with derivatives at r and z directions, correspondingly, and RH contains the terms with mixed derivatives. The computational domain is chosen as a section of a cylinder of length L and radius R with a dense plasmoid located at its center. By virtue of the symmetry, the following conditions were imposed on the axis r = 0: ∂ns /∂r = ∂vz /∂r = ∂p/∂r = ∂Hz /∂r = vr = vϕ = Hr = Hϕ = 0. It was assumed that the perturbations from the plasmoid do not reach the upper and lateral boundaries, which are far enough from the center. On the boundaries, the conditions for the background plasma were specified as follows: Hr = Hϕ = vr = vz = vϕ = 0, p = p∞ , ns = ns∞ , Hz = Hz∞ . In the computational domain L × R, we introduce the difference grid with the constant steps in space. The differential operators ∂/∂r and ∂/∂z are approximated by the difference operators Λk1 and Λk1 of the order k, where k = 1, 2, ... (the superscript k is omitted below). The convective terms vr ∂/∂r and vz ∂/∂z are approximated by one-sided difference operators of the first order (k = 1) with allowance for the sign of the velocities vr and vz . The terms with the pressure (magnetic and gasdynamic) are approximated by the formulas conjugate to the convective terms (see [1]), and the second derivatives are approximated by symmetric threepoint difference operators. In order to construct an economical scheme we perform splitting of the operators Ωα (α = r, z) with respect to physical processes, i.e., we write them as Ωα = Ωα1 + Ωα2 . The operator Ωα1 contains the terms with the pressure, the free inertial terms from the motion equations (which appear

Simulation of the motion and heating of an irregular plasma

361

in the axisymmetric coordinate frame), and all terms of the energy equation along the α direction. The operator Ωα2 contains the convective terms from the equations of motion and all terms of the continuity equations. We note that the continuity equations are approximated in divergent form. For the numerical solution of the equations of gas dynamics (9) or (10), we consider the following scheme of approximate factorization with splitting of the operators with respect to physical processes and spatial directions: 2 0

(I + τ αΩrj ) (I + τ αΩzj )

j=1

 n f n+1 − f n n = − A−1 (W) , τ

(13)

or the equivalent fractional-step scheme: n  n ξ n = − A−1 (W ) , (I + τ αΩr1 ) ξ n+1/4 = ξ n , (I + τ αΩr2 ) ξ n+2/4 = ξ n+1/4 , (I + τ αΩz1 ) ξ n+3/4 = ξ n+2/4 , n+1

n+3/4

(I + τ αΩz2 ) ξ =ξ f n+1 = f n + τ ξ n+1 ,

(14)

,

where τ is the time step, n is the time-step number, and α is the weight parameter. The final difference scheme (13) or (14) approximates the basic equations (9) with the order O(τ + h). The fractional-step scheme (14) is implemented by scalar three-point sweeps, similar to the splitting scheme [1]. The equations on the right side of the scheme are approximated in conservative form, which improves the calculation accuracy. In the absence of magnetic field, the difference scheme (13) is unconditionally stable. After finding gas-dynamic parameters on the (n + 1)-th time layer, the magnetic-induction equations (7) are solved. In order to solve them numerically we constructed the following scheme of approximate factorization with operator splitting into spatial directions in a similar manner: 2 0    H n+1 − (f H )n H H (f ) = −(WH )n . I + τ αΩrj I + τ αΩzj τ j=1

(15)

Similar to (13) the scheme (15) is implemented by scalar sweeps and is unconditionally stable for α ≥ 0.5. The explicit representation for the magnetic field in scheme (13) leads to the disturbance of the unconditional stability of schemes (13) and (15), but allows effectively obtaining a solution of the equations by scalar sweeps, which makes this algorithm economical.

3 Calculation Results The numerical algorithm was tested on the initial data that were taken from [2] for the one-fluid numerical model in the absence of magnetic field. The results of these tests show good correspondence of the models.

362

N.A. Huber

Fig. 1. Initial electron density

In the series of calculations, we studied the propagation of a gas cloud, located at the center of the computational domain, into the rarefied background plasma. The following initial state was taken in these calculations. The unperturbed values were specified outside of the plasmoid: nsD = 1.0, nsL = 0 (density is normalized to the value 1015 cm−3 ), Ti = 1.5 (the temperature is normalized to the value 1eV ), Vi = Ve = 0. In the plasmoid the initial values varied from background to the values in the plasmoid center: nsDc = 500.0, nsLc = 500.0, Tic = 1.1. The magnetic field was constant in the computational domain at the initial time moment: Hr = 0, Hϕ = 0, Hz = 0.01 (the magnitude of magnetic field is normalized to the value 45kGs). The initial electron density and pressure were calculated using the proportions ne = ZD nsD fD (Te ) + ZL nsL fL (Te ) and pe = ne kTe . The initial electron density distribution is shown in the Fig. 1. One of the plasma parameters of interest is its temperature. We give the initial electron temperature distribution in the Fig. 2. In the present paper we give the results of numerical experiments with the presented model. In these calculations we assume that there is no thermal force and no frictional force between electrons and ions. This assumption simplifies the magnetic field equations. The action of the relativist electron beam was also taken in a simplified manner: the energy absorption by electrons takes place not along the magnetic field lines, but along the z-axis, which is an approximation. Below we give the results of the calculations of plasmoid propagation at the time moment t = 0.65μs. In Fig. 3 the distribution of the electron density is shown, the level lines of electron density are given in Fig. 4. These figures show that the dense plasma propagates mostly along the z-axis (or along the magnetic field). This is the correct plasma behavior in a powerful magnetic field. The electron temperature distribution is given in Fig. 5. Comparing with Fig. 2 one can see that the electron temperature raises. It is due to the relativist electron beam action. Its energy is transferred to the plasmoid and background plasma

Simulation of the motion and heating of an irregular plasma

Fig. 2. Initial electron temperature

Fig. 3. Electron density

Fig. 4. Level lines of electron density

363

364

N.A. Huber

Fig. 5. Electron temperature

through electrons. As the calculations show, electron and ion temperatures quickly become equal. It is confirmed also by the theoretical estimations. Because of the peculiarities of relativist electron beam energy absorption, the outer part of the plasmoid is heated more rapidly than the inner one. This could be also seen on Fig. 5. The results of the calculations are also in a good accordance with the theoretical fact that the shock wave velocity in the direction across the magnetic field raises with the magnetic field increase.

4 Conclusions The numerical model for the axisymmetric problem on dense plasmoid propagation into rarefied background plasma in the external magnetic field with the external heat source is considered in the magnetohydrodynamic three-fluid approximation. The main regularities of the effect of the magnetic field on plasma expansion are obtained by numerical simulation. The effect of the external source on plasma expansion is estimated. The experiments show the accordance between the calculations with the presented numerical model and theoretical conceptions of plasma behavior. The model allows performing efficient calculations of plasma motions with strong parameter drops.

Simulation of the motion and heating of an irregular plasma

365

References 1. Kovenya VM, Tarnavskii GA, Chernyi SG (1990) Use of the Splitting Method in Aerodynamic Problems. Novosibirsk: Nauka (in Russian) 2. Astrelin VT, Burdakov AV, Huber NA, Kovenya VM (2001) Prikl Mekh Tekhn Fiz 42(6):3–18

Numerics and simulations for convection dominated problems D. Kr¨oner Institute of Applied Mathematics, University of Freiburg i. Br., Hermann-Herder-Str. 10, 79104 Freiburg i. Br., Germany [email protected] Summary. The most important challenges in numerical simulations consist in the development of codes for new problems, in the improvement of the performance of existing codes and its validation. In this paper I will focus on the second topic. For inviscid compressible and convection dominated flows, in particular for problems from magnetohydrodynamics and for flows through porous media we will demonstrate some tools which are useful for more efficient codes: local grid refinement based on rigorous a posteriori error estimates, artificial boundary conditions for problems in outer domains, higher order schemes, balanced schemes for problems with source terms and relaxation schemes.

1 Introduction Now the quality of numerical schemes and the power of modern computers have reached a level, such that realistic time dependent simulations in three spatial dimensions are possible. The main focus of our work concerns the simulation of compressible viscous and inviscid fluids, electrically conducting fluids, reactive flows, flow through porous media. phase transitions. The joint features of all these flow problems are the dominating role of the convection compared to diffusive and viscous effects. These types of flow problems occur in different applications which we investigate in several projects: time dependent flows through two stroke engines and turbines, mathematical modelling of the atmosphere of the sun, detonation and deflagration, biodegradation, fuel cells and human tissues, cavitation.

368

D. Kr¨ oner

For the discretization of the underlying mathematical methods we will use discontinuous Galerkin methods and in particular finite volume methods, which are special discontinuous Galerkin methods with piecewise constant ansatzfunctions. The main tools for the improvement of the performance of the numerical algorithms are adaptive time steps, dynamically local grid adaption, efficient treatment of problems with source terms, relaxation methods for real gases, transparent boundary conditions for problems in unbounded or large domains. These methods have to be implemented on the most powerful modern parallel computers. In particular the combination of dynamical, local grid adaption and parallel computing will be only efficient on the basis of a powerful dynamical load balancing. In the following we will report on some projects and results from our group which concerns the topics mentioned above.

2 Finite volume schemes We are going to solve the initial value problem for nonlinear conservation laws, which can be written in the form ∂t u + ∇ · f (u) = 0 in IRn × IR+ , u(x, 0) = u0 (x) in IRn ,

(1)

where f and u0 are given functions. An standard method to solve this problem numerically is the following finite volume scheme. Let T := {Tj |j ∈ I} be an admissible triangulation. The discrete initial data on the triangle Tj are defined as u0j

1 := |Tj |

 u0 (x), for all j ∈ I, Tj

and the discrete value at time t on the triangle Tj are defined as un+1 = un j − j

Δt  n n n gjl (uj , ul ), |Tj | l∈N(j)

(2)

Numerics and simulations for convection dominated problems

369

where Δt is the time step, N (j) denotes the indices of the neighboring triangles n (u, v) we have to assume (e.g. in the scalar to Tj . For the numerical flux gjl case) for all j, l ∈ I and all u, v ∈ IR n gjl ∈ C 0,1 (IR2 , IR), n n gjl (u, v) = −glj (v, u), n gjl (u, u) = νjl f (u) where νjl f (u) denotes the outer normal to the triangle Tj in the direction of Tl of length |Sjl |, the length of the joint edge between Tj and Tl , n (u, v) is monotone increasing in u and decreasing in v. gjl

For this scheme the following results are available: For 1st -order schemes we have: uh → u in L1loc , where u is the unique entropy solution of 1), [18], [16]. Also for 2st -order schemes we have uh → u in L1loc [17]. 1

For the 1st , 2st -order scheme we have ||uh − u||L1 ≤ ch 4 [4], [23], [33]. From numerical experiments we expect that the optimal estimate should 1 be ||uh − u||L1 ≤ ch 2 , but this is an open problem. These results are also true for weakly coupled systems of the form ∂t uk + ∇ · fk (uk ) = g(u1 , ..., un ) in IRn × IR+ , uk (x, 0) = u0k (x) in IRn .

(3)

for k = 1, ..., n [27].

3 A posteriori error estimates for conservation laws One of the most efficient tool for the improvement of the performance of a code in several space dimensions is the adaptive mesh refinement, controlled by some rigorous a posteriori error estimators. In more details, for the exact solution u of the initial value problem (1) we want to compute a numerical solution uh , such that the error ||u − uh || is less than a given tolerance ε. This can be done in an efficient way if we have an a posteriori error estimate of the form ||u − uh || ≤



ηj (uh ) ≤ ε,

(4)

j∈I

where ηj (uh ) is a local quantity on the triangle Tj which only depends on the numerical solution uh and can be computed explicitly. If the sum in (4) is less

370

D. Kr¨ oner

than the given tolerance ε than also ||u − uh || ≤ ε and we can stop. If not we will mark those triangles for which the quantity ηj (uh ) is too large and we will refine these triangles. Then we repeat the computation on the new grid and check again if the sum in (4) is less than the given tolerance ε. In [19] we were able to prove an estimate like (4). Actually we have shown the following theorem. Theorem 1. Let u be the solution of (1) and let uh be the numerical solution of the finite volume scheme (2), such that uh (x, t) = unj if x ∈ Tj and tn ≤ t < tn+1 . Then we have 



 |u − uh |dxdt ≤ cT K

{|x−x0 |≤R+1}

|u0 (x) − uh (x, 0)|dxdt + Q + Q

1 2

.(5)

Here K denotes a given set in the (x, t)−space bounded by tN . Let C denote the smallest cone of dependence containing K. The R is the radius of a circle, which includes the part of C at time t = 0, and Q is defined as Q :=





n j∈M (tn )

Δth2j |un+1 − unj | + 2LΔt j

 

(Δt + h2e )|une+ − une− |.

n e∈E(tn )

The sum over n runs over all time steps up to tN . The set M (tn ) is the set of all triangles in the intersection of C at time tn with the triangulation, hj is the diameter of the cell Tj , L is a known constant, E(tn ) is the set of all edges in the intersection of C at time tn with the triangulation and e+, e− are the left and right limits of uh on the neighboring cells on the edge e, he is the length of the edge e. The proof of this theorem and more details as well as numerical experiments can be found in [19]. The proof uses very extensively the Kruzkov entropy condition [20] and the Kuznetsov [21] error estimates and some basic results of [2]. This result has been generalized in [25], [26] to degenerate parabolic equations of the form

  ∂t u + div vf (u) − D(u)∇u +λu = 0 u(·, 0) = u0

in IRd ×]0, T [, in IRd ,

(6)

where f ∈ C 2 (IR, IR), D ∈ C 1 (IR), D(s) ≥ 0 ∀s ∈ IR, v ∈ C 1 (IRd ×]0, T [, IR2 ). (7)

Numerics and simulations for convection dominated problems

371

Similar as for scalar conservation laws weak solutions u ∈ L1 (IRd × IR+ ) of (6) are not unique and therefore one has to define entropy solutions. Furthermore the result of Theorem 1 can be generalized to weakly coupled systems [24] in particular for those which can be written in the form (see [24], [15]) φ∂t co + div (uco − φD(u)∇co ) = −ν o kgr (co , cs , B), φ∂t cs + div (ucs − φD(u)∇cs ) = −ν s kgr (co , cs , B), ∂t B = νB kgr (cO , cS , B) − kdec B, kgr (co , cs , ) := μ

(8)

cs co B. co + K o cs + K s

This initial boundary value problem is a mathematical model for biodegradation in porous media. We consider a flow with a given velocity u through a porous medium. In the porous medium we have oxygen with a concentration co and immobile biomass with concentration B. Initially we have (inject) a substrate with a concentration cs in a small open set. The chemical reaction between the biomass, oxygen and substrate is given by the right-hand side in (8). The substrate and the oxygen will be transported by diffusion and advection. The biomass is immobile and the reaction with oxygen and the substrate will destroy the substrate. More details can be found in [15]. Unfortunately the a posteriori error estimate of Theorem 1 holds only for scalar equations and weakly coupled systems in multi spatial dimensions. It is not proved for general nonlinear systems like the Euler equations of gas dynamics or the MHD equations. Nevertheless we have used successfully similar expression for the dynamical grid control as in (5) for systems of conservation laws. In particular reactive flows with a detailed modelling of the chemical reactions and could be simulated on present computers only by using dynamical, local grid adaption, even in 2D [13], [14].

4 Discontinuous Galerkin methods for Magnetohydrodynamics Traditionally the finite volume schemes are used most frequently. They are special discontinuous Galerkin methods with piecewise constant ansatz functions. But higher order upwind finite volume schemes are different from higher order discontinuous Galerkin methods. In some cases they are more efficient than higher order upwind finite volume schemes and the ”stencil” for the discretization are more compact. Therefore we will present some numerical

372

D. Kr¨ oner

results concerning the comparison of higher order upwind finite volume and higher order discontinuous Galerkin schemes applied to the MHD equations ∂t ρ + ∇ · (ρu) = 0   ∂t (ρu) + ∇ · ρuut + P = 0 ∂t B + ∇ · (uBt − But ) = 0 ∂t (ρe) + ∇ · (ρeu + Pu) = 0 ∇·B = 0

(conservation of mass), (conservation of momentum), (induction equation), (conservation of energy), (divergence constraint).

(9)

First let us briefly explain the basic idea of the discontinuous Galerkin schemes for conservation laws, written in the form ∂t u + ∇ · f (u) = 0 in IR2 × (0, T ), u(x, 0) = u0 (x) in IR2 .

(10)

We look for a numerical approximation uh of the exact solution u of (10) in the space Vh = {vh ∈ L∞ (IR2 ) : vh |Tj ∈ P k (Tj )

for all j ∈ I} ,

(11)

where P k (Tj ) is the set of all polynomials of order k on Tj . Let (vj,i )ni=1 be a basis of the space P k (Tj ) and uh (x, t) =

n 

uji (t)vj,i (x)

x ∈ Tj ,

0≤t≤T .

i=1

Then the definition of the discontinuous Galerkin scheme is given by d dt

 

 uh (x, t)vh (x) dx = − Tj

Sjl ∈∂Tj



f (uh (x, t)) · njl vh (x) dx +

(12)

Sjl

f (uh (x, t)) · ∇vh (x) dx for all vh ∈ Vh .

+ Tj

The integrals will be replaced by suitable quadrature rules. In particular for the boundary integral along Sjl we use the same numerical fluxes as in (2). For the time discretization we use higher order Runge-Kutta methods. The first results concerns the 1D MHD advection problem with a known smooth solution [28], [7]. In the following table we present the CPU time, the L1 -error and the experimental order of convergence (EOC), for the finite

Numerics and simulations for convection dominated problems

373

volume scheme of first (FV 1st order) and second order (FV 2nd order), and the discontinuous Galerkin methods of first (rgdg 1rd order), second (rgdg 2nd order) order, and third (rgdg 3nd order) order for different meshsizes h. The best result for the second order finite volume scheme is obtained for the grid size h = 0.0003125. The corresponding CPU time is 8350,4s. The second order discontinuous Galerkin method is able to reach a better accuracy in 3497,73 s for a grid size h = 0, 00125. The efficiency of the third order discontinuous Galerkin method seems to be comparable to the second order finite volume scheme for this magnitude of h. If we look for the EOCs we see that those for the third order discontinuous Galerkin method converge to three. This of course will imply that third order discontinuous Galerkin method will be the most efficient for smaller h. The results in the second table have been obtained [12] with a different TVB limiter due to Shu [3]. The limiter needs a parameter M which has to be chosen a priori in a proper way. The results are shown for different choices of M . The case M = 0 corresponds to the classical limiter which has been used for the tests in the first table. The results become better for larger M and the efficiency is much better than for the second order discontinuous Galerkin method in the first table. For further results about discontinuous Galerkin methods we also refer to [1], [11], [28].

5 Absorbing boundary conditions in unbounded domains Another source to save CPU-time is the reduction of the size of the computational domain in particular for problems in which the original continuous model is formulated in an unbounded domain. The simulation of the flow of electrically conducting fluids in the solar atmosphere is an excellent example to demonstrate this possibility. Let us consider again the MHD system ∂t ρ + ∇ · (ρu) = 0 (conservation of mass),  2  ∂t (ρu) + ∂x ρu + p11 + ∂z (ρuv) = ρg (conservation of momentum),   ∂t (ρv) + ∂x (ρuv) + ∂z ρv 2 + p22 = ρg (13) ∂t (ρe) + ∇ · (ρeu + P u) = 0 (conservation of energy), ∂t B + ∇ · (uBt − But ) = 0 (induction equation), ∇·B = 0

(divergence constraint),

with additional equation of state, initial and boundary data. We consider the system (13) in an unbounded domain D ∪ D1 where

374

D. Kr¨ oner Table 1. CPU time, L1 -error, experimental order of convergence (EOC) FV 1. Order FV 2. Order h cpu (sec) L1 error L1 eoc cpu (sec) L1 error L1 eoc 0.0050000 7.74 0.061096 0.773313 33.95 0.008104 1.481747 0.0025000 26.52 0.033519 0.866114 131.65 0.002408 1.750702 0.0012500 95.30 0.017637 0.926352 520.28 0.000680 1.823480 0.0006250 362.50 0.009059 0.961241 2076.37 0.000182 1.905568 0.0003125 1426.33 0.004592 0.980146 8350.40 0.000048 1.911384 rkdg 1. Order rkdg 2. Order h cpu (sec) L1 error L1 eoc cpu (sec) L1 error L1 eoc 0.0050000 9.20 0.061105 0.774092 213.66 0.000689 2.318214 0.0025000 34.15 0.033521 0.866254 864.48 0.000136 2.344800 0.0012500 126.25 0.017638 0.926398 3497.73 0.000027 2.342311 0.0006250 489.90 0.009059 0.961230 14171.60 0.000005 2.347266 0.0003125 1951.20 0.004592 0.980095 57362.90 0.000001 2.318443 h 0.0050000 0.0025000 0.0012500 0.0006250 0.0003125

rkdg 3. Order cpu (sec) L1 error L1 eoc 1297.39 0.0004584 2.502123 5395.29 0.0000762 2.589485 21696.70 0.0000123 2.636016 88940.10 0.0000019 2.695403 374429.00 0.0000003 2.871983

D = {x, z)|0 < x < x0 , 0 < z < z0 )}, D1 = {x, z)|0 < x < x0 , z0 < z < ∞)}. In addition to (9) we have now an gravitational term ρg in the equations for conservation of momentum in (13). We want to compute a numerical solution on the artificial bounded domain D. Then of course we need an additional artificial boundary condition on the artificial boundary Γ := {x, z)|0 < x < x0 , z = z0 )} such that the (numerical) solution on the bounded domain D will become a good approximation of the exact solution on the unbounded domain D ∪ D1 . In order to derive this boundary condition we assume that all quantities can be written in the following form. ρ(t, x, z) = ˚ ρ(z) + ρ˜(t, x, z), u=˚ u+u ˜, w =˚ v + v˜, p=˚ p + p˜, ˚ + B, ˜ B=B

(14)

Numerics and simulations for convection dominated problems

375

Table 2. The results for different choices of the parameter M M = 0.0 L1 error 6.050958e-02 1.625419e-02 3.486139e-03 6.705800e-04 1.246134e-04 2.253667e-05 3.965033e-06 6.836333e-07

h 0.0400000 0.0200000 0.0100000 0.0050000 0.0025000 0.0012500 0.0006250 0.0003125

cpu (sec) 16.92 79.78 320.09 1297.39 5395.29 21696.70 88940.10 374429.00

h 0.0400000 0.0200000 0.0100000 0.0050000 0.0025000 0.0012500 0.0006250 0.0003125

M = 300.0 cpu (sec) L1 error 16.92 9.769210e-03 79.78 1.253958e-03 320.09 1.772760e-04 1297.39 2.400000e-05 5395.29 3.410175e-06 21696.70 4.611064e-07 88940.10 6.187187e-08 374429.00 8.449158e-09

L1 eoc 1.896 2.221 2.378 2.428 2.467 2.507 2.536

M = 50.0 L1 error L1 eoc 2.954139e-02 7.842982e-03 1.913 1.484416e-03 2.401 2.385400e-04 2.637 3.340195e-05 2.836 4.526632e-06 2.883 6.225468e-07 2.862 8.606711e-08 2.855

L1 eoc 2.962 2.822 2.884 2.815 2.887 2.898 2.872

˚ are a stationary solution of (13), i.e. where ˚ u, w ˚, ˚ p, B p(z) = g(z)˚ ρ(z), ∂z ˚ ˚ = 0, ˚ u = 0, ˚ v = 0, B γ =˚ p, ˚ ρ ˜ are assumed to be small perturbations. Now we use (14) to and u ˜, v˜, p˜, B linearize (13) and obtain ∂t ρ˜ ˜ ˚ ρ∂t u ˚ ρ∂t v˜ ∂t p˜ ˜ ∂t B

+˚ ρ∂x u ˜ + ∂z (˚ ρv˜) + ∂x p˜ + ∂z p˜ − g ρ˜ + γ˚ p(∂x u ˜ + ∂z w ˚) + ˚ ρg w ˜

= 0, = 0, = 0 in D ∪ D1 , = 0, = 0.

As initial and boundary data we use  ρ˜, u ˜, v˜, p˜t=0 = 0; for z → ∞ : p˜ = 0,

(15)

and for x = 0, x = x0 : periodic boundary condition, respectively. Then after some technical calculation we obtain an PDE for p˜

376

D. Kr¨ oner

 q     p˜ = 0, ∂t2 p˜ − Θ−1 q ∂z2 p˜ + ∂x2 p˜ + q  ∂z p˜ + q q

(16)

γ 1 . In order to derive an additional bound˚ pγ−1 , Θ = γ−1 where q(z) := γ−1 ary condition on Γ = {x, z)|0 < x < x0 , z = z0 )}, i.e. on the artificial boundary of D, we have to assume q(z) := a−1 e−2αz , which is convenient also from the physical point of view. Using the Laplace transformation with respect to t and the Fourier-transformation with respect to x of (16) we obtain the following non reflecting boundary condition on Γ

αΘ + α e−αz ∂z p˜ + √ 2 e−αz p˜ − Q−1 ∂t p˜ + √ aΘ aΘ

t

Aλ ∗ Q˜ p 1 23 4

= 0.

= Aλ (t−t ,z)Qp(t ˜  ,z  )dt 0

Due to our assumptions, (16) reduces to (17) ∂t2 p˜ −

  ρ γ−1 2 ˚ γ˚ p 2 p˜ = 0. g ∂z p˜ + ∂x2 p˜ + g∂z p˜ + ∂z g − ˚ p γ ˚ ρ

For the linear system the boundary condition (17) selects the (restriction on D of the ) exact solution on D ∪ D1 and we can prove the following Theorem. Theorem 2. [10], [32] Let p˜ be a solution of (17) in D ∪ D1 and let p¯ be a solution of (17) in D which satisfies the boundary condition (17) on Γ . Furthermore assume that the boundary conditions (15) are also satisfied for p˜ and p¯, respectively. Then we have p˜ = p¯ on D. Although this result refers to linear systems we have applied it with some success also for nonlinear systems. For further results we refer to [10], [32].

6 Relaxation schemes for general equations of states For the simulation of flow of real gases as in the solar atmosphere we have to take into account general equations of state and this can reduce the performance of the whole code. For the Euler equations of gas dynamics this problem has been solved in [6] by using the energy relaxation method. In [8], [9] this idea has been generalized to the MHD system

Numerics and simulations for convection dominated problems

377

∂t ρ + ∇ · (ρu) = 0 (conservation of mass),   ∂t (ρu) + ∇ · ρuut + P = 0 (conservation of momentum), ∂t B + ∇ · (uBt − But ) = 0 (induction equation), ∂t (ρe) + ∇ · (ρeu + Pu) = 0 (conservation of energy), ∇ · B = 0 (divergence constraint), 1 1 2 |B|2 = 0 (equation for the total energy), e − ε − |u| − 8πρ 2 1 1 BBt = 0 (equation for the pressure tensor ), |B|2 )Id + P − (p + 4π 8π p = p(ρ, ε) (equation of state for the pressure). Let us briefly explain the basic idea of the energy relaxation method for the above system. Here the equation a state p = p(ρ, ε)

(equation of state for the pressure ),

is modelling the real gas effect and it is very expensive to adapt the code to this special law. New Riemann solvers have to be developed or the poor Lax-Friedrichs scheme has to be used. The energy relaxation method can be simply added to the scheme and will give a much better efficiency than the Lax-Friedrichs scheme. We split the internal energy ε = ε1 + ε2 into a much simpler equation of state p1 = p1 (ρ, ε1 ) instead of (17)and define Φ(ρ, ε1 ) = ε(ρ, p1 ) − ε1 . Then instead of the above MHD system consider the following relaxation system: ∂t ρ + ∇ · (ρu) = 0 (conservation of mass),   ∂t (ρu) + ∇ · ρuut + P1 = 0 (conservation of momentum), ∂t B + ∇ · (uBt − But ) = 0 (induction equation), ∂t (ρe1 ) + ∇ · (ρe1 u + P1 u) = λρ(ε2 − Φ(ρ, ε1 )) ∂t (ρε2 ) + ∇ · (ρε2 u) = −λρ(ε2 − Φ(ρ, ε1 )) ∇·B 1 1 2 |B|2 e1 − ε1 − |u| − 8πρ 2 1 1 BBt |B|2 )Id + P1 − (p1 + 4π 8π p1

=0

(divergence constraint),

=0

(equation for the total energy),

=0

(equation for the pressure tensor ),

= p1 (ρ, ε)

378

D. Kr¨ oner

(equation of state for the pressure p1 ). Theorem 3. (Coquel,Perthame [6]) If U (x, t) = limλ→∞ U λ (x, t) exists then U satisfies the original MHD. In [8], [9] it was shown that the energy relaxation method is more than two times faster than the classical Lax-Friedrichs scheme in order to obtain the same accuracy at least for special test problems.

7 Parallelization For our codes in 3D, i.e. the MHD- and the compressible Navier-Stokes code, we could considerably improve the performance by using parallel computers with distributed memory. Because of the dominance of the convection all codes are explicit and we can distribute the load to the different processors by a domain decomposition. The efficiency depends on the distribution. Each processor should need approximately the same time to perform the computation of the values for the new time level. Since we refine and coarsen different parts of the grid after each (or after a fixed number of ) time step(s) , the load distribution may become unbalanced and a repartitioning is necessary. A general strategy for dynamical load balancing for general data structure has been developed in [29]. Here the computational cost for each processor were estimated in terms of the number of elements in the corresponding domain. The redistribution was done if the load of one processor became larger than 1.1 of the average load of all processors. This concept has been generalized to the 3D MHD code in [9], [32]. For 32 processor they got a speedup of 6.25 and an efficiency of 0.78 compared to four processors.

References 1. Becker J (1999) Entwicklung eines effizienten Verfahrens zur L¨ osung hyperbolischer Differentialgleichungen, Universit¨ at Freiburg, Dissertation, http://www.freidok.uni-freiburg.de/volltexte/123/ 2. Chainais-Hillairet C (1999) M2AN, Math Model Numer Anal 33:129–156 3. Cockburn B, Hou S, Shu C-W (1990) Math Comp 54(190):545–581 4. Cockburn B, Coquel F, LeFloch P (1994) Math Comput 63:77–103 5. Cockburn B, Karniadakis E, Shu C-W (2000) The development of discontinuous Galerkin methods. Lecture Notes in Computational Science and Engineering 11:3-52 6. Coquel F, Perthame B (1998) SIAM J Numer Anal 35:2223–2249 7. Dedner A, Kr¨ oner D, Rohde C, Schnitzer C, Wesenberg M (2003) Comparison of finite volume and discontinuous Galerkin methods of higher order for systems of conservation laws in multiple space dimensions. In: Hildebrandt S, Karcher H (eds): Geometric Analysis and nonlinear partial differential equations. Berlin, 573–590

Numerics and simulations for convection dominated problems

379

8. Dedner A, Wesenberg M (2001) Numerical methods for the real gas MHD equations. In: Freist”uhler H, Warnecke G (eds) Hyperbolic Problems: Theory, Numerics, Applications. International Series of Numerical Mathematics, Birkh¨ auser, Eight International Conference in Magdeburg, February/March 2000 140:287–298 9. Dedner A (2003) Solving the system of radiation magnetohydrodynamics for solar physical simulations in 3d. Ph.D. thesis, University of Freiburg, Department of Applied Mathematics 10. Dedner A, Kr¨ oner D, Sofronov IL, Wesenberg M (2001) J Comput Phys 171(2):448–478 11. Dolejsi V, Feistauer M, Schwab C On some aspects of the discontinuous Galerkin finite element method for conservation laws, to appear in: Math Comput Simulation 12. Klassen L (2003) TVB limiters for discontinuous Galerkin methods for conservation laws. Personal communication 13. Kr¨ oner D, Gessner T (2001) Godunov type methods on unstructured grids and local mesh refinement. In: Toro EF (ed) Godunov methods. Theory and applications. International conference, Oxford, GB, October 1999, 527–547 14. Gessner T (2000) Timedependent adaption for supersonic combustion waves modeled with detailed reaction mechanism. Ph.D. thesis, University of Freiburg 15. Kl¨ ofkorn R, Kr¨ oner D, Ohlberger M (2002) Int J Numer Meth Fluids 40:79–91 16. Kr¨ oner D, Rokyta M (1994) SIAM J Numer Anal 31:324–343 17. Kr¨ oner D, Noelle S, Rokyta M (1995) Numer Math 71(4):527–560 18. Kr¨ oner D (1997) Numerical schemes for conservation laws. Wiley-Teubner series advances in numerical mathematics. B. G. Teubner Verlagsgesellschaft mbH, Stuttgart, first edition 19. Kr¨ oner D, Ohlberger M (2000) Math Comput 69(229):25–39 20. Kruzkov SN (1970) Mat Sbornik 81:123 (in Russian) and Math USSR Sbornik 10:217–243 21. Kuznetsov NN (1976) USSR Comput Math Math Phys 16(6):105–119 22. K¨ uther M (2000) East-West J Numer Math 8(4):299–322 23. Noelle S (1995) A note on entropy inequalities and error estimates for higher order accurate finite volume schemes on irregular families of grids. SFB 256, Preprint 400, Bonn 24. Ohlberger M, Rohde C (2002) IMA J Numer Anal 22(2):253–280 25. Ohlberger M (2001) M2AN, Math Model Numer Anal 35(2):355–387 26. Ohlberger M (2001) Numer Math 87(4):737–761 27. Rohde C (1998) Numer Math 81(1):85–124 28. Schnitzer T (2003) Discontinuous Galerkin Verfahren angewandt auf die MHDGleichungen, Diplomarbeit 29. Schupp B (1999) Entwicklung eines effizienten Verfahrens zur Simulation kompressibler Str¨ omungen in 3D auf Parallelrechnern. Ph.D. thesis, Albert-Ludwigs-Universit¨ at, Mathematische Fakult¨ at, Freiburg, http://www.freidok.uni-freiburg.de/volltexte/68 30. Wesenberg M (1998) Finite-Volumen-Verfahren f¨ ur die Gleichungen der Magnetohydrodynamik in ein und zwei Raumdimensionen, Diplomarbeit 31. Wesenberg M. Efficient MHD Riemann solvers for simulations on unstructured triangular grids, to appear in: J Numer Math 32. Wesenberg M (2003) Efficient higher–order finite volume schemes for (real gas) magnetohydrodynamics, Ph.D. thesis, Freiburg

380

D. Kr¨ oner

33. Vila JP (1994) RAIRO Anal Numer 28:267–295

Modified Finite Volume Method for Calculation of Oceanic Waves on Unstructured Grids A.V. Styvrin Institute of Computational Technologies SB RAS, Lavrentiev Ave. 6, 630090 Novosibirsk, Russia [email protected] Summary. A computational scheme for computing the water surface wave propagation, based by the nonlinear shallow water equations is described. The TaylorGalerkin method is applied for discretization on time. Mixed Modified Finite Volume Method on unstructured triangular grid is used for spatial approximation. The linear basis functions are used for approximation wave surface, quadratic basis functions for approximation velocity fields for elimination non-physical spatial oscillation.

1 Introduction The problem of oceanic wave propagation appears from investigating tsunami phenomena, which can be treated as solitary waves propagation on surface of media. Because of complexity of the full Navier-Stokes equations describing the fluid motion with free surface, various models are applied for simplification of simulation above problem. The major one is Shallow Water Equations. A complex approach to study tsunami phenomena implies the development of appropriate numerical simulation methods. Among other features, the method should provide a way to precise description of the complex shore boundaries and bathymetry distribution. It is possible by the use of nonorthogonal and non-rectangular unstructured meshes. However, unstructured meshes enable more flexibly to build local condensation and adaptive meshes. At present, a most frequently used method for spatial approximations on unstructured grids is Finite Elements Methods. Nevertheless, Finite Volume Method has a some advantages, such as simplicity and obviousness, based on approximations directly of conservation laws. Therefore this research is concerned with the application of the Finite Volume Method based on Finite Element technology on unstructured triangular meshes to the numerical simulation of the above problem. The nonlinear shallow water system is used as mathematical model above problem. The aim of this work is investigations of application the Modified Finite Volume Method to nonlinear shallow water model.

382

A.V. Styvrin

2 Approximation of the Nonlinear Shallow Water Equations The classical shallow water model in non-divergent form are: Ht + ∇ · (HU) = 0, (HU) t + ∇ · (HUUT ) + gH∇η = 0, where H(x, y, t) = h(x, y) + η(x, y, t) is the total depth; h(x, y) is the depth under non disturbed water layer or ’bottom function’; η(x, y, t) is the free surface elevation over undisturbed state; U = (u, v)T is the depth averaged horizontal velocity; u, v are the components of depth averaged velocity on x-, y-direction accordingly; g is the gravity acceleration. These governing equations should be considered together with the boundary conditions of two types. The first one is ”open” boundary conditions or nonreflecting/absorbing boundary conditions, which can be noted as follows: ηt + c (∇η · n) = 0 on Γopen ,

√ where c is propagation speed of the wave, in linear case c = gH, n is unit vector normal to the boundary Γopen . This artificial boundary condition was used for restriction of computational area. The second one is ”wall” boundary conditions or conditions of the full reflection: U · n = 0 on Γwall , where n is unit vector normal to the boundary Γwall , Γ = Γopen ∪ Γwall . Hereinafter, the shallow water equations can be presented in following convenient matrix-vector form:   ϕt + ∇ · ϕUT + gHP (H) = gHP (h) , T

T

where ϕ = (H, Hu, Hv) , P (H) = (0, Hx , Hy ) . The Taylor-Galerkin scheme has been applied to the time discretization of shallow water equations in matrix-vector form. This time discretization scheme based on Taylor-series expansions quantities of full-depth and velocities [1]. The governing parameter “theta” operates a behavior of the scheme: the value θ = 1 corresponds the Crank-Nikolson scheme with second order accurate, a value θ = 0 and θ = 2 corresponds explicit and implicit scheme accordingly with first order accurate on time. The Modified Finite Volume Method with finite element approach was applied for spatial approximation. The weighed residuals principle with piececonstant test functions was applied for this purpose, where test functions are follows:

Modified Finite Volume Method for Calculation of Oceanic Waves

ω (x) =

0, x ∈ / Ωi , 1, x ∈ Ωi

383

Ω = ∪Ωi , Ωi – finite volume for node i. i

 '6 &   5 dΩ = w ϕk+1 + θ2 τ ∇ · ϕk+1 UT k + gH k P H k+1 Ω ! 6 5 k θ = w ϕ + 2 τ gH k P (h) dΩ+   &    '6 !Ω 5 + w θ2 − 1 τ ∇ · ϕk UT k + gH k P H k − gH k P (h) dΩ, !

Ω

Finally, applying the Gauss-Ostrogradski formula, it is possible to proceed to an integration on the finite volume boundaries [2]: ⎡ ⎤ ⎤ ϕk+1 dy − ϕk+1 dx ϕk+1 2 3 1 ! ⎣ ϕk+1 uk dy − ϕk+1 v k dx ⎦+ ⎣ ϕk+1 ⎦dΩ + θ τ 2 2 2 2 k+1 k k+1 k k+1 Ωi ∂Ω i ϕ u dy − ϕ v dx ϕ3 ⎡ 3 3 ⎤ 0 ! ⎦ dΩ = + θ2 τ g ⎣ H k ϕ1 k+1 x k k+1 Ωi ⎡ ⎤ ⎡ k ⎤−H ϕ1 y ⎡ ⎤ 0 0   ! ! ! ϕ1k ⎦+ = ⎣ ϕ2 ⎦dΩ + τ g ⎣ H k hx ⎦ dΩ + θ2 − 1 τ g ⎣ H k ϕ1 k+1 x k k+1 k k Ω Ωi Ω i i −H ϕ1 y −H hy ϕ3 ⎤ ⎡ k k dy − ϕ dx ϕ 2 3  !  ⎣ ϕk2 uk dy − ϕk2 v k dx ⎦ . + θ2 − 1 τ ∂Ωi ϕk3 uk dy − ϕk3 v k dx !



A dual mesh is constructed on centers of triangles and middle point of their edges, such that each node has the corresponding ’complete’ or ’incomplete’ finite volume, bounded by median segment [3]. Function of total depth is represented on each triangle as follows: ϕ1 =

3 

ϕˆ1,i Li ,

i=1

where ϕˆ1,i – value of total depth in node i, Li – piece-linear basis functions. However, the choice for discretization of a function of total depth and velocities functions for mixed Modified Finite Volume Method on basis functions of the identical first order, that is identical to account of a wave surface and fields of velocity components in same nodes, results in a degeneracy of their joint approximation. Such as mixed Finite Element Methods for solution this problem, with the purpose of sufficing to conditions Ladyzhenskaya-BabushkaBrezzi, is offered original Modified Finite Volume Method on quadratic basis functions for approximation of the momentum equation [4]. For this case a dual mesh is constructed on medians of four small triangles on each simplex, based on vertices of triangles and middle point of their edges. Velocities functions are represented on each triangle as follows:

384

A.V. Styvrin

ϕ2 =

6 

(2)

ϕˆ2,i Li , ϕ3 =

i=1

6 

(2)

ϕˆ3,i Li ,

i=1

where ϕˆ2,i , ϕˆ3,i – value of velocity in node i on x− and y− direction accord(2) ingly, Li – piece-quadratic basis functions. The piece-quadratic basis functions are connected with piece-linear basis functions thus: (2)

(2)

L1 = 2L21 − L1 , L4 = 4L1 L2 , (2) (2) L2 = 2L22 − L2 , L5 = 4L2 L3 , (2) (2) L3 = 2L23 − L3 , L6 = 4L1 L3 . The computational nodes for representation total depth and velocities show on figure 1. Thus, mixed Modified Finite Volume approximation consist of construction of discrete analogs of integrals on finite volume area and on finite volume edges. These integrals had been solved exactly by next formulas [5]: ! n n n n1 !n2 !n3 ! 2 meas Tk , (L1 ) 1 (L2 ) 2 (L3 ) 3 dΩ = (n1 +n 2 +n3 +2)! Tk ! n n 1 !n2 ! meas J, (L1 ) 1 (L2 ) 2 dS = (n1n+n 2 +1)! L

k

where meas T is area of triangle Tk , and meas J is length of edge J, Li is piece-linear basis function. The system of linear algebraic equations are received in outcome of application offered mixed Modified Finite Volume approximation with TaylorGalerkin discretization for each finite volume of computational area. It is possible to write in follow: ⎞ ⎛ k+1 ⎞ ˜ ˜H θ τ2 J M H u ˜u M ⎠ ⎝ Huk+1 ⎠ = ⎝ θ τ gK ˜ u + θτ˜ I 0 2 2 v ˜v ˜ v + θτ˜ Hvk+1 θ τ2 g K 0 M 2I ⎞⎛ k ⎞ ⎛ ˜ ˜H (θ − 1) τ2 J M H u ˜u M ⎠ ⎝ Huk ⎠ + ˜ u + (θ − 1) τ ˜ = ⎝ (θ − 1) τ2 g K I 0 2 v ˜v ˜ v + (θ − 1) τ ˜ Hvk (θ − 1) τ2 g K 0 M 2I ⎛ ⎞⎛ ⎞ 0 0 0 0 ˜u 0 ⎠⎝h⎠, + ⎝ 0 τ gK ˜v h 0 0 τ gK ⎛

˜ u, M ˜ v are global mass matrices, and J, ˜ ˜ ˜ u, K ˜v ˜ H, M Iu , ˜ Iv , K where blocks M are global stiffness matrices. The matrix of these system is rarefied, nonsymmetric, but has a symmetric profile. Therefore, a matrix has been stored with sparse row format and for a

Modified Finite Volume Method for Calculation of Oceanic Waves

385

(a)

(b) Fig. 1. Dual mesh and computational nodes for: a – piece-linear basis functions, b – piece-quadratic basis functions

solution of this system has been used iterative method BiConjugate Gradients method with stabilization BiCGStab [6].

3 Numerical results As an example of the performance of the scheme, the numerical simulation of the scattering of solitary wave by a conical island have been addressed. This problem is described in [7] as Benchmark problem 2. The center of a conical island was located at X = 12.96 m and Y = 13.8 m in a 30-m-wide and 25-m-long flat-bottom computational area. The water depth was set at 32 cm. The cone island had diameters of 7.2 m at the toe and 2.2 m at the crest. The vertical height of island was 62.5 cm.

386

A.V. Styvrin

As a goal of this modeling is not runup wave on island, it geometric parameters had been slightly changed. The upper part of conical island has been replaced to cylinder with diameter equal a diameter of conical island at h = 0. The height of these cylinder was equal 0.2 total height of island. The initial water elevation η and velocities u, v obtained from a solitary wave theory are specified as follows: $  2 3H0 η(x0 , t) = H0 sech 4h3 (x0 − ct) , u(x0 , t) =

cη(x0 , t) h+η(x0 , t) ,

v(x0 , t) = 0.

The initial wave height H0 was 0.0125 m, the position x0 = 0, the initial time is 0. The computational mesh for mixed Modified Finite Volume Method was 17735 nodes and 34916 triangles, the characteristic length of edge was 0.25 m. The time step ’tau’ was 0.025 s. Figure 2 shows the fragment of triangulation around island, on following figures 3 - 4 shows propagation solitary wave through basin.

Fig. 2. Part of triangular mesh around island

At the perimeter of island has been located a gages for measure of maximal water elevation. The results of numerical simulation has been compared with existing simulation data [8], which simulated this problem on nonlinear and nonlinear-dispersion model by Finite Difference. Figure 5 shows the comparison between the computed maximal wave height around island shoreline and existing data. The −  − and −  − lines represent existing data by Finite Differencies approximation of nonlinear and nonlinear-dispersion shallow water model accordingly. The − # − line represented maximal wave heights

Modified Finite Volume Method for Calculation of Oceanic Waves

387

(b) Fig. 3. Wave surface at: (a) – 4 sec, (b) – 7 sec

Fig. 4. Wave surface at 10 sec

obtained using Modified Finite Volume approximation of wave equations. The − $ −, −♦−, − ◦ − lines corresponds heights obtained using offered mixed Modified Finite Volume approximation of nonlinear shallow water equations. The computed results is in good agreement with the existing ones.

388

A.V. Styvrin

Fig. 5. Maximal wave height on island shoreline

4 Conclusion In this article the method for modeling simulation of surface nonlinear waves on water, permitting to work in computational region with complicated geometry was offered. A feature of the used mixed Modified Finite Volume Method is the use of an information only about a primary grid, i.e. all information about finite volumes, about additional nodes originating on middles of edges of triangular elements are recomputed, that allows essentially to reduce memory size necessary for simulation methods.

References 1. Ambrosi D, Quartapelle L (1998) J Comp Phys 146:546–569 2. Stywrin AV, Shurina EP, Chubarov LB (2002) A feature of FVM/FEMapproach for modeling surface waves on water. In: International Conference on Computational Mathematics, Novosibirsk 3. Prakash C, Patankar SV (1985) Numer Heat Transfer 8: 259–280 4. Stywrin AV, Shurina EP, Chubarov LB (2003) Comp Techn 8: 109–122 5. Eisenberg MA, Malvern LE (1973) Int J Numer Methods Eng 7: 574–575 6. Saad J (1996) Iterative Methods for Sparse Linear Systems, PWS Publishing Company, Boston 7. Briggs MJ, Synolakis CE, Harkins GS, Green DR (1995) Runup of solitary waves on a circular island. In: Long-Wave Runup Models. International Workshop on Long-Wave Runup Models. Friday Harbor, San Juan Island, Washington, USA 8. Chubarov LB, Fedotova ZI, Shokin YuI, Einarsson B (2000) Int J Comp Fluid Dyn 14(1): 55–73

Performance aspects on high performance computers — from microprocessors to highly parallel smp systems H. Mix1 and W.E. Nagel2 1 2

Center for High Performance Computing (ZHR), Dresden University of Technology, 01062 Dresden, Germany [email protected] Center for High Performance Computing (ZHR), Dresden University of Technology, 01062 Dresden, Germany [email protected]

Summary. With the growing demands in computer-modeling and simulation, the programming conversion techniques for efficient algorithms and procedures must keep up with rapid progress on the instrumental platforms in computer architecture. Beyond moderately parallel vector computers, parallel computers in quit “massive parallel” form have been pushing into the market for several years now. In many cases, the efficient use of such parallel computers is not only a challenge to parallel programming but also to the effective utilization of the large performance potential of the microprocessor underneath. Today, in many cases the sustained single PE performance of a large HPC application is in the order of a few percent of the peak speed announced in the advertising of the microprocessor. This still limits the success of such machines especially in large scale environments. The paper will discuss aspects of programming and optimization in HPC applications on parallel computers. Some emphasis will be placed on supportive software tools.

1 Introduction For current high performance computing applications not only the usage of very efficient algorithms and procedures is decisive. In many cases, the efficient use of parallel computers is not only a challenge to parallel programming but also to the effective utilization of the large performance potential of the microprocessor underneath. Besides a deeply staged memory hierarchy current and future processor units offer a variety of different levels of parallel processing in combination with an increasing number of intelligently organized functional units. Also the programmer’s know-how of program optimization, choice of compiler versions, and usage of the compiler options has an important influence on the program runtime. So the rapid progress of computer architectures requires a large amount of experiences and knowledge from the programmer

390

H. Mix and W.E. Nagel

of modern computer applications. To help software engineers in this process it is also necessary to provide them with excellent performance analysis and measurement tools.

2 Performance Comparisons To draw an exact picture of the performance of at least one well known program kernel, several variants performing matrix multiplications have been studied on different machines. The program variants come from two languages, C and Fortran, and from six possible permutations to order the loops. The figure 1 shows the performance results on different machines, AMD Athlon, 1,2 GHz (“radi”), Compaq Alpha-PC (“roman”), Hitachi SR8000, and Intel Itanium-2, 1GHz machine (top to bottom). X-axis shows the matrix dimension, y-axis provides MFLOPS. The results show that the achieved performance strongly depends on the used algorithm, but also on the matrix dimension. The behavior differs considerable between the used processor architectures. As it is obvious the Fortran version outperforms the C version in all cases. The reason is that optimized Fortran compilers most times detect the matrix multiplication as a kernel and generate hand optimized code for that part of the program. This is not true for the C version. Here, performance decrease factors between 1.5 and about 7 can be observed, which show that the current architectures allow to achieve reasonable performance, but most times only with hand-optimized code.

3 Performance Measurement To compare and estimate the performance of different computer systems, usually well established benchmark programs are used. But in consequence of the strongly different computer architectures, the traditional benchmarks quite often highlight only a few aspects of the performance behavior. To support flexible performance measurement, a new tool environment called BenchIT [1] has been developed by the Center for High Performance Computing at Technical University of Dresden. The performance evaluation normally has two steps: the performance measurement itself, and the data visualization and comparison of the obtained results. Therefore, the BenchIT program has a modular design, consisting of three layers (as shown in figure 2): The measuring kernels, a main program for the measurements, and a web based display engine to plot and compare all gathered data. Within the BenchIT project a “kernel” is referred to as an algorithm or measuring program. Typical kernels are for example matrix multiplications, the Jacobi algorithm, certain mathematical operations or special MPI test programs (e.g. Roundtrip-Message and Binary-Tree-Broadcast). Programming a

Performance aspects on hpc

391

Fig. 1. Performance of Fortran-coded (left) and C-coded (right) matrix multiplication

kernel requires a certain discipline from the kernel author. Since BenchIT is to run on a variety of computing platforms, the kernel code has to be compatible to all of them. This can be accomplished best by: using only basic program structures, avoiding system calls and system specific operations, and utilizing the functions provided by the main program. The professed goal of the BenchIT team is to have every kernel distributed with BenchIT being executable on every platform. Nevertheless, it is possible and not less valued to write a problem specific kernel. A typical use for this strategy might be the optimization of a certain algorithm on a specific target architecture. Every BenchIT user is also able and asked to act as an author of a kernel. A “custom” kernel can then be sent to the BenchIT team and will be taken into the kernel set, if considered useful and complying with the kernel rules. The main program controls the generation of measurement data by the kernels, offers service routines, and writes the result file. The main program

392

H. Mix and W.E. Nagel Webserver for displaying & comparing results

reads

Resultfile

writes

Main program runs the measurement

provides

interface.h

fulfills

Kernel provides the Algorithm

Fig. 2. Components of the BenchIT-Project

has to operate (just as the kernels) under a wide variety of system environments. However, the environment of the operating system is just one part of this variety. Another issue is the runtime environment. Since BenchIT supports also MPI as a parallel environment, the main program has to adapt itself to that as well. One might argue that it would also be feasible to have different main programs for each runtime environment. So far it is considered as an unnecessary code redundancy, especially since using just one main file has been practicable in all cases investigated. One measurement run follows the scheme shown in figure 3. During the measurement, the main program calls the kernel with a certain problem size not only one time. Instead the measurement is repeated n times. This enables a certain error correction for the kernels since performance differences during a measurement run for one problem size due to other system processes running on the CPU are inevitable. Each kernel informs the main program in the init routine if the outliers of each function have to be expected upwards or downwards. BenchIT then uses the best value of the n runs. After measuring the main program will analyze all gathered data. In this step, minima and maxima are gathered and useful display boundaries are calculated. The main program will then write the results to an output file as well as a gnuplot-file that can be used by the local QUICKVIEW. The communication between the three BenchIT program layers is enabled only by two interface files. This ensures that the modules have a common interface to work together. The result file or output file collects all the relevant

Performance aspects on hpc

393

Initialize Program & Kernel

Measure one Problemsize

still time left?

yes

no

Analyze Data

Write Result- & QuickviewFile

Fig. 3. Schematic view of one measurement run

data of the measuring kernels. All the results are coded in ASCII format for easy viewing and editing. The output file has been created on the local machine and is then transferred to the BenchIT webserver. The two acquisition layers are linked through a common C header file (interface.h). It defines an information structure, where a kernel provides details about itself, and furthermore specifies the functions called by the main program and service functions to be used by the kernels. The BenchIT kernels generate a large amount of measurement results depending on the number of functional arguments. First, it is possible to display this data by a QUICKVIEW tool on the local machine for each measuring program run. The additional BenchIT web interface complements the project by giving the possibility to plot the results of many measuring kernels and compare them directly. So the user has the chance to show the selected results of different measuring programs in only one coordinate system. Therefore, it is important how the plots will be assembled and how the user can customize the plots. The BenchIT team has so far implemented two strategies: Selection by architectural characteristics: The first possibility is to compare different values of one architectural feature. It is possible to show the sensitiveness of the results of the measuring kernels on the physical size of one architectural feature. This way it is possible to look for specific

394

H. Mix and W.E. Nagel

performance data for a searched architectural feature and compare it to other architectures. Selection by the measuring kernel: The second possibility compares different characteristics of architecture, which are all calculated by just one measuring kernel. Often, there are different reasons which can cause characteristic minima, maxima, or a special shape in a graph. Then it is necessary to collect additional information about the tested system to explain such effects on a base of wellknown system properties and physical values of the realization. The BenchIT-Project wants to provide an evaluation platform by offering a variety of measurement kernels as well as an easily accessible display engine, thus enabling an easy way to measure performance on a specific system and compare the result, which is a full graph instead of just a number, to other results contributed by other approaches.

4 Performance Analysis Performance optimization remains one of the key issues in parallel computing. To assist programmers in achieving fast and effective program codes an appropriate organization of performance data is necessary. The potentially enormous amount of generated performance information (in particular if event tracing is used) has to be processed and displayed in a way that an ordinary user can understand. Up to now, the predominant parallel architecture classes have been the classic shared–memory systems with limited scalability and the scalable distributed–memory MPP systems. For each class, performance analysis methodologies and tools (like AIMS [2], PABLO [3], PARADYN [4], Paragraph [5], Paraver [6], Vampir [7] - [11]) have been developed, and many significant scientific and engineering codes have been ported and optimized. An application programmer aiming at scalability beyond one SMP node currently has to make a choice between the message–passing programming model, or a combination of message–passing between SMP nodes, most probably MPI, and a shared–memory model within one node, most probably OpenMP. In both cases, performance analysis faces new challenges, and there has been already a strong need to extend the current performance–analysis tools by support for multi–threading, analysis of memory and CPU statistics, display of scheduler events and interactions. With the emergence of Grid applications running on more than one system, the task of analyzing and tuning scientific applications actually becomes even harder. Tools need to be extended to enable performance analysis also for this new application class in global Grid environments.

Performance aspects on hpc

395

The program Vampir [7], developed at the Center for High Performance Computing at Technical University of Dresden, is a performance analysis tool which has addressed these kinds of topics for many years. It supports the performance analysis process and makes it easy for the programmer to get insight into the parallel execution of a program on any kind of parallel system. Vampir converts trace information into a variety of graphical views, e.g. timeline displays showing state changes and communication, profiling statistics displaying the execution times of routines, communication statistics indicating data volumes and transmission rates, and more. The displays can be related to source code, and Vampir’s advanced navigation functions allow to easily zoom into arbitrary time intervals. The profiling and communication statistics help in identifying performance bottlenecks. Using instrumentation libraries like Vampirtrace [12] during the application’s run all function or subroutine calls, calls to the MPI library and all transmitted messages are recorded in a tracefile that can be analyzed later with the Vampir performance analysis tool. An instrumentation API allows to define and record arbitrary user defined events. Hardware performance monitor (HPM) counters, which are today available on all processor chips can also be sampled. Taking into account that parallel programs will usually produce a huge amount of trace data during a trace session, a powerful configuration mechanism allows customisation of Vampirtrace’s operation, e.g. by filtering the trace data during runtime.

Fig. 4. Vampir Timeline with Zooming Capabilities

Because Vampir has access to a trace that exactly describes when an event occurred in the application, it can draw these changes over time in so-called

396

H. Mix and W.E. Nagel

Timeline Displays (see figure 4). One of the key features of Vampir is the advanced navigation functions that easily allow zooming onto arbitrary time intervals. Thus, the programmer proceeds from a bird’s eye view of the whole trace to finer granularity of detail. Other displays that are open like statistics windows will respond to the zoom (or magnification) in the time series, in a context sensitive way and will automatically adapt to the selected time range. If counter sample data is stored in the trace, then Vampir can display the counters in a special counter timeline or combined with the call stack in a per process timeline so that the user can correlate hardware behavior with the program structures. (see figure 5)

Fig. 5. Vampir Process Timeline with hardware counter

A variety of different displays are available in Vampir to show time series, statistical data, etc. Because of the volume of information in the trace file, data viewed in a Vampir display is generally defined by filters. As examples, filters may be chosen on processes, on time, by activity, or by symbol, depending on the display and menu options.

5 Summary and Outlook Many parallel applications do not benefit sufficiently from the increasing peak performance of either symmetric multi-processor systems, parallel processor systems, or clusters of workstations. This is because most applications are not designed to run efficiently on a large number of processors, or architectures with different interconnect fabrics. This directly leads to a lack of scalability,

Performance aspects on hpc

397

Fig. 6. Vampir Message Statistics

or load balancing problems, or both, and poor performance of the parallel application. Therefore, the need for performance optimisation and sophisticated tools remain key requirements for the next decade and will grow in importance. With the programs BenchIT and Vampir, we have presented two performance tools that should assist programmers in achieving fast and effective program codes.

References 1. Juckeland G, Boerner S, Kluge M, Koelling S, Nagel WE, Pflueger S, Roeding H, Seidl S, William T, Wloch R (2003) BenchIT – Performance Measurement and Comparison for Scientific Applications. In: Proceedings of the International Conference on Parallel Computing (ParCo2003), Elsevier Science (to be published) 2. Yan JC (1994) Performance Tuning with AIMS – An Automated Instrumentation and Monitoring Systeme for Multicomputers. In: Proceedings of the 27th Hawaii International Conference on System Sciences, Wailea, Hawaii II, http://www.nas.nasa.gov/Groups/Tools/Projects/AIMS 3. DeRose L, Reed DA (1999) SvPablo: A Multi-Language ArchitectureIndependent Performance Analysis System. In: Proceedings of the International Conference on Parallel Processing (ICPP’99), Fukushima, Japan 4. Miller BP, Callaghan M.D, Cargille JM, Hollingsworth JK, Irvin RB, Karavanic KL, Kunchithapadam K, Newhall T (1995) IEEE Computer 28(11):37–46

398

H. Mix and W.E. Nagel

5. Heath MT, Malony AD, Rover DT (1998) Visualization for parallel performance evaluation and optimization. In: Stasko IJ, Domingue J, Brown MH, Price BA (eds): Software Visualization, MIT Press, Cambridge 6. Labarta J, Girona S, Pillet V, Cort´es T, Gregoris L (1996) DiP: A Parallel Program Development Environment. In: 2nd International EuroPar Conference (EuroPar 96), Lyon, France 7. Nagel WE, Arnold A, Weber M, Hoppe H-C, Solchenbach K (1996) VAMPIR: Visualization and Analysis of MPI Resources. In: “Supercomputer 63” XII(1):69–80 8. Brunst H, Hoppe H-C, Nagel WE, Winkler M (2001) Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach. In: Proceedings of ICCS2001, Springer LNCS 2074 9. Brunst H, Nagel WE, Hoppe H-C (2001) Group Based Performance Analysis for Multithreaded SMP Cluster Applications. In: Proceedings of Euro-Par2001, Springer LNCS 2150 10. Hoeflinger J, Kuhn B, Nagel WE, Petersen P, Rajic H, Shah S, Vetter J, Voss M (2001) An Integrated Performance Visualizer for MPI/OpenMP Programs. In: Proceedings of WOMPAT2001, 40ff, Springer LNCS 2104 11. Brunst H, Gabriel E, Lange M, Mueller MS, Nagel WE, Resch MM (2003) Performance Analysis of a Parallel Application in the GRID. In: Sloot P, Abramson D, Bogdanov AV, Dongarra J, Zomaya AY, Gorbachev YE (eds): Computational Science - ICCS2003, Springer LNCS 2658 12. http://www.pallas.com