INSTRUCTIONS FOR PREPARING A PAPER FOR ...

9th GRACM International Congress on Computational Mechanics Chania, 4-6 June 2018

A PARALLEL ALGORITHM FOR THE EMBEDDED REINFORCEMENT MESH GENERATION OF LARGE-SCALE REINFORCED CONCRETE MODELS George Markou Department of Civil Engineering Universidad Católica de la Santísima Concepción Concepción, Alonso de Ribera 2850, Chile e-mail: [email protected] Keywords: Parallel Algorithms, Embedded Reinforcement, Mesh Generation, Large-Scale Models. Abstract. Modeling of large-scale reinforced concrete structures under monotonic and cyclic analysis requires significant computational demand given that models can incorporate hundreds of thousands of embedded rebars. In order to discretize and simulate any reinforced concrete structure by the 3D detailed modeling approach, the embedded mesh has to be generated prior to the analysis, a procedure that is controlled by the hexahedral mesh that is used to discretize the concrete domain. Numerically managing the computational demands that rise from the embedded mesh generation procedure can be challenging and time-consuming, especially in the case where the numerical models foresee the use of more than half a million of embedded rebars. Parallel processing and the use of a simple but efficient algorithmic implementation are presented in this research work. The use of OpenMP API for Shared Memory Parallelization specifications is adopted herein so as to integrate the proposed embedded mesh generation algorithm with the ability to use multiple cores during the search and creation of embedded rebars within large-scale hexahedral meshes. In order to investigate the performance of the proposed algorithm, a reinforced concrete model of a Reactor Building was constructed that foresees the use of 181,076 concrete hexahedrons and 2,703,400 embedded rebar elements. 1 INTRODUCTION The accurate assessment of reinforced concrete (RC) structures through the use of 3D detailed modeling (under monotonic and cyclic loading [1-3]), foresees the discretization of the concrete domain through the isoparametric hexahedral element (8-, -20 or 27-noded) that treats crack openings by deploying the smeared crack approach and models the reinforcement mesh by discretizing it with the embedded rod or beam finite elements. In order to allocate the embedded rebar elements that are found within each hexahedral element, a search procedure is performed to allocate the intersection of the macro-elements [4] with the hexahedral mesh (see Fig. 1), while the nodes of the embedded bar macro-elements are also checked whether they are located within a hexahedral element or lie on one of its surfaces. This procedure can be time consuming when the analysis foresees the solution of a large-scale model [4] that consists hundreds of thousands of embedded rebar elements.

Figure 1 Embedded rebar macro-elements inside hexahedral finite elements [4].

George Markou

Figure 2. Flow chart of the updated embedded rebar element mesh generation method. [4] As it was presented in [4], the numerical method for allocating and generating the embedded rebars inside hexahedral elements was proposed by Barzegar and Maddipudi [5], which was an extension of the work of Elwi and Hrudey [6]. Their proposed mesh generation method [5] has the advantage of allowing arbitrary positioning of the rebars inside the concrete elements and a free geometry for each hexahedral element. The method was found to be hindered with additional computational demand [4] when dealing with a large number of rebars, given that the iterative solution procedure that was required during the search of intersection points was computationally demanding. The computational performance of this procedure was optimized by Markou [4], through the introduction of a geometric constraint (see Fig. 2) that was used to decrease unnecessary searches of intersection points between elements that were found in faraway parts of the mesh. Additionally, the method was also integrated with an algorithm that was able to determine whether a hexahedral element was symmetric, where the natural

George Markou

coordinates of a virtual node were computed explicitly. This approach further increased the computational efficiency of the embedded rebar mesh generation procedure that was computationally efficient in handling meshes up to half a million embedded rebars. Nevertheless, the required computational time for generating the embedded rebar elements of a double deck RC bridge model required 6 hours as it was reported in [4], which constitutes a significant computational time duration. It is evident that, taking into advantage the ability of parallel processing is the next step to take so as to decrease the computational cost of the embedded rebar mesh generation procedure, thus be able to numerically handle even larger models that will consist billions of rebars. The objective of this research work, is to investigate the parallelization of the algorithm proposed in [4], by using the OpenMP API [7] for shared memory parallelization specifications. The performance of the proposed algorithm was numerically investigated through the use of a large-scale RC structure that foresaw the generation of 2,703,400 embedded rebar elements. 2 EMBEDDED MESH GENERATION ALGORITHM Generating the embedded rebar mesh of any 3D detailed model requires the use of a pre-processing software to construct the hexahedral mesh and the embedded macro-elements (EMEs). For the needs of this research, Femap [8] commercial software is used to perform the mesh construction of the under study RC structure, which is then analyzed through the use of the research software Reconan FEA [9] that was recently integrated with the parallel embedded rebar mesh generation algorithm. Fig. 2 shows the serial algorithm that was proposed in [4], where the search of embedded rebar elements is performed by applying the geometric constraint approach and the short embedded rebar filter that was also first introduced in [4]. As it can be seen in Fig. 2, for each EME the hexahedral elements of the model are checked whether they have any intersections with the under study EME, given that the hexahedrons satisfy the corresponding geometrical constraint. The algorithmic structure of the embedded mesh generation procedure, as described in Fig. 2, offers the advantage of parallelizing the entire procedure without the need of applying any special parallel solution approaches. The next section will discuss the parallelization features of the proposed algorithm. 3 PARALLEL ALGORITHM 3.1 OpenMP As it was stated in [7], the OpenMP API specification provides a model for parallel programming that is portable across shared memory architectures. One of the main advantages of OpenMP is that numerous vendors support it, where different compilers can be used to build parallel applications. In addition to that, the use of OpenMP extends to different programming languages such as C, C++ and Fortran, while the specifications provide support for sharing and privatizing data [7]. Microsoft Visual Studio 2010 and later versions, support OpenMP, where the latest compilers integrated within the studio software have the option of activating and deactivating the ability of generating parallel code by using the OpenMP specifications. Additionally, the procedure of developing parallel algorithms with OpenMP is considered to be less demanding in comparison to other solutions such as MPI. When programming any parallel solution algorithm, the ability of debugging the under development code is not feasible thus the use of commands that are easy to implement and control during the programming procedure is of great importance in achieving an error free product with high performance characteristics and high scalability. For this reason OpenMP introduced commands that can automatically parallelize “do” (Fortran) or “for” (C, C++) loops, without the need of using a large number of additional command lines. Furthermore, the use of the parallel sections construct is available and it was also found in this work to be the most effective OpenMP command when parallelizing a double “do” loop. Therefore, for the needs of this research work, the sections construct (see Fig. 3) was adopted for distributing the computational load to the selected cores. Subroutine SectionConstructExamble Implicit None !$OMP Parallel Sections !$OMP Section ! Sec 1 Call DoWork_1 !$OMP Section ! Sec 2 Call DoWork_2 . . .

!$OMP Section ! Sec i Call DoWork_i !$OMP End Parallel Sections End Subroutine SectionConstructExamble

Figure 3. OpenMP parallel section construct in Fortran language.

George Markou

3.2 Proposed Parallel Algorithm As it was stated in the previous section, the serial algorithm found in Fig. 2 was restructured so as to be able to be solved in a parallel manner. In achieving this objective, the first step foresaw the allocation of the algorithmic parts that involved the main computational demand of the mesh generation procedure. Based on a numerical investigation, it was found that the double “do” loop was the algorithmic procedure that required most of the computational time during the mesh generation procedure, as shown in Fig. 2. The second step in developing the parallel algorithm was to develop the mechanism through which the computational load was to be distributed to the cores, where it was chosen to directly divide the number of EMEs into equal in number subdomains. Therefore, the EMEs were allocated during the preparatory stage of the proposed algorithm that foresaw the computation of their total number and then based on their ID numbering, they were accordingly divided into subdomains. It is important to note here that, the length of each EME determines the number of hexahedral elements that will be intersected thus controls the number of calculations and the corresponding computational demand required during the mesh generation procedure. The computational demand that immerses in each search loop was found to be proportional to the length of the EMEs and the corresponding sizes of the hexahedral finite elements. In this research work, the construction of the subdomains did not account for this factor, which will be a subject of future investigation. Fig. 4 shows the flowchart of the proposed parallel algorithm that was numerically investigated for the needs of this research work.

Figure 4. Flow chart of the proposed parallel embedded rebar mesh generation procedure. 4

NUMERICAL RESULTS AND DISCUSSION

So as to investigate the numerical performance of the proposed algorithm, a RC reactor building was used to be discretized and analyzed by using the under study algorithmic parallel implementation. In section 4.1 the

George Markou

model’s mesh that was used to assess the numerical performance of the proposed algorithm is presented, while the preliminary computational investigation that was performed when engaging the proposed parallel embedded rebar mesh generation algorithm for generating the mesh of the under study building, are presented in section 4.2. 4.1 Reinforced Concrete Model Fig. 5 shows the 3D view the hexahedral mesh of a NUSCALE reactor building that has a total length of 75.25 m and a width of 30 m. The maximum height of the RC structure is 39.55 m, where the 8-noded hexahedral isoparametric finite element was used to discretize the framing system of the building. The 177,504 EMEs that were used during the embedded rebar mesh generation procedure can be seen in Fig. 6, while the details of the constructed mesh can be depicted in Table 1. In order to decrease the mesh construction procedure, the option of using very long EMEs was adopted herein. The longest EMEs are found in the raft slab, the exterior walls and along the roof slab (see Fig. 6).

Figure 5 Reactor Building. Hexahedral elements finite element mesh. Roof Beam Rebars

Roof Slab Rebars Exterior Wall Rebars

Slab Rebars

Shear Wall Rebars Arch Beam Rebars

Raft Foundation Rebars

Figure 6 Reactor Building. Embedded rod elements finite element mesh. Num. of Hexa Elements 181,076

Num. of Embedded Macro-Elements 177,504

Num. of Hexa Nodes 271,226

Num. of Generated Embedded Rebar Elem. 2,703,400

Num. of Short Embedded Rebar Elem. 3,392

Table 1 Finite element mesh details. 4.2 Algorithmic Performance In order to investigate the algorithmic performance of the proposed parallel embedded rebar mesh generation algorithm, a standard 8-core CPU system was used, while the embedded rebar mesh generation was performed by using different numbers of cores per run. It must be noted here that, the Intel(R) Xeon processor that was used herein had a 3.70 GHz computing power per core. Fig. 7 shows the graph that derived from the 8 analyses that were performed in order to test the scalability of the proposed parallel algorithm. All 8 analyses were performed twice in order to reassure that the recorded computational times were objective and that the software was not affected by any other applications that were running simultaneously at the background. In addition to that, the

George Markou

CPU that was used to perform the analyses was dedicated to this project and only the minimum required applications were running during the parametric investigation. As it can be observed in Fig. 7 and Table 2, the required computational time decreases proportionally to the number of cores increase, where the scalability of the problem is found to be optimum for the cases of 2, 3 and 4 cores. This is attributed to the fact that the embedded mesh generation procedure foresees for each EME search to be independent thus no time is spent in order to perform reduction procedures or computations that need to use the same matrices or connectivity arrays (minimal communication demands). Moreover, Reconan FEA uses the latest Fortran attributes that allow for the use of the derived data type which is similar to C structures and has some similarities with C++ classes. Therefore, the data management does not require any additional treatment when the embedded mesh generation procedure was converted to parallel, whereas each core receives the variables and data that refer to the at hand subdomain with minimum communication demands. The overall performance of the proposed parallel algorithm demonstrated a maximum scalability when using up to 4 cores, while the performance of the code exhibited a lower scalability ratio when 5, 6, 7 and 8 cores were used (see Table 2). This numerical finding was attributed to the fact that the EMEs were constructed by using various lengths that varied between 0.7 and 75 meters long (based on the geometry of the RC structural members). Therefore, when the mesh was divided into subdomains of equal in number EMEs, their length was not accounted as a controlling factor during the subdivision procedure. Consequently, some subdomains incorporated EMEs that require a significantly larger number of calculations in comparison to others, affecting the overall scalability of the proposed algorithm, especially when 5 subdomains were used to divide the EMEs. This created a notable load imbalance ratio, regardless the fact that there was no significant communication volume during the analysis that would have added further computational demand during the parallel solution procedure. Nevertheless, the CPU parallel efficiency of the proposed algorithm was found to be satisfactory (see Table 2). It is significant to note at this point that, the last loop of the proposed algorithm (Fig. 4) is performed in a serial manner, where its computational time is also accounted within the provided computational time durations depicted in Table 2. Therefore, the computational time that is required to perform the serial part of the mesh generation procedure is note affected when more than one core is used, thus the computed average CPU parallel efficiency is negatively affected by this adopted algorithmic approach. This numerical issue will be further optimized and presented in an updated version of the proposed parallel algorithm. In addition to the above discussed load imbalance issue, a direct solution is to use a constant EME length throughout the mesh that will ensure similar computational demands for all virtual node searches, thus achieve an optimum scalability. Nonetheless, this mesh construction approach would create a restraint to the mesh development stage, thus a more general solution is deemed proper that would account for the length of each EME. By considering each EME’s length as a weight factor and by using it to determine the optimal subdomain division during the preparatory stage of the parallel algorithm (see Fig. 4) in achieving a minimum imbalance ratio, would provide with a comprehensive solution to the load imbalance ratio issue. This is currently a subject of future research work. 7.00

Time in hours

6.00

Run 1 Run 2

5.00 4.00 3.00 2.00 1.00 0.00 0

1

2

3

4

5

6

7

8

9

Number of Cores

Figure 7 Computational time vs number of cores. Based on the data provided in Table 2, it can be noted that the required average computational time for generating 2,703,400 embedded rebar elements (and discard 3,392 short embedded rebar elements) was 6.065 hours when using the serial code, while the corresponding time was decreased to 1.25 hours when 8 cores were deployed (60.4% CPU parallel efficiency). This illustrates the ability of the proposed parallel algorithm to significantly decrease the computational time, even in the case where the subdomains were not optimally constructed so as to ensure an even computational load distribution, thus achieving a high CPU parallel efficiency. It is noteworthy to state here that, the under study model presented in this research work is currently the largest model found in the international literature in terms of the number of generated embedded rebar elements. Regardless the significantly large number of embedded rebar elements, the proposed parallel algorithm was found to be able to handle this numerically intensive task in a computationally efficient manner, providing with a

George Markou

satisfactory CPU parallel efficiency. Finally, the deformed shape of the embedded rebar elements and the von Mises strain contour of part of the roof’s hexahedral mesh are shown in Fig. 8, as they resulted from a static numerical analysis for the self-weight of the structure. Num. of Cores 1

Comp. Time Run 1 (h) 6.07

Comp. Time Run 2 (h) 6.06

Average CPU Parallel Efficiency (%) Reference

2

3.02

3.00

100.8

3

1.93

1.90

105.6

4

1.50

1.48

101.7

5

1.52

1.50

80.3

6

1.44

1.41

70.9

7

1.36

1.35

63.8

8

1.25

1.25

60.4

Table 2 Computational time for generating the embedded rebar mesh for different number of cores.

Figure 8 Deformed shape and solid von Mises stress contour of the roof. 5

CONCLUSIONS

A simple and efficient parallel algorithm was proposed for the embedded rebar mesh generation of large-scale RC models that use hexahedral isoparametric finite elements to discretize the concrete domain and model the steel reinforcement as embedded rebar elements. The proposed parallel algorithm used the OpenMP specifications so as to distribute the computational work to the cores based on a proposed parallel algorithm presented herein. The computational performance of the proposed parallel algorithm was investigated by performing parallel analyses that foresaw the use of 2 to 8 cores during different parallel analyses. Based on the numerical findings it was concluded that the proposed parallel algorithm was able to decrease the computational time with a satisfactory CPU parallel efficiency when dealing with a mesh that incorporated more than 2.7 million embedded rebars. Furthermore, it was found that the various EME lengths found in the initial RC mesh affected the scalability of the developed algorithm. Having subdomains that consisted very long EMEs in comparison to other subdomains that had the same number of EMEs but with shorter lengths, a load imbalance ratio immersed that affected the CPU parallel performance of the proposed algorithm. Nevertheless, the overall scalability of the proposed first version parallel embedded rebar mesh generation algorithm was found to be satisfactory. A future objective of this research work is to investigate the development of a weight factor that will control the subdomain decomposition thus ensure a balanced computational load distribution according to each EME length. In addition to that, the solution of larger in size problems will be performed by using more cores in order to investigate the scalability of the propose algorithm when engaging more than 8 cores. Finally, the use of MPI will also be a subject of research in an attempt to compare the overall performance of the mesh generation procedure in parallel computing environments with shared memory.

George Markou

REFERENCES [1] Markou G., Papadrakakis M. (2013), “Computationally efficient 3D finite element modeling of RC structures”. Computers and Structures, 12(4):443–98. [2] Markou, G., Mourlas, Ch. and Papadrakakis, M. (2017), “Cyclic Nonlinear Analysis of Large-Scale Finite Element Meshes Through the Use of Hybrid Modeling (HYMOD)”, International Journal of Mechanics, 11(2017), pp. 218-225. [3] Mourlas Ch., Papadrakakis M. and Markou G. (2017), “A computationally efficient model for the cyclic behavior of reinforced concrete structural members”, Engineering Structures, 141:97-125. [4] Markou G. (2015), “Computational Performance of an Embedded Reinforcement Mesh Generation Method for Large-Scale RC Simulations”, International Journal of Computational Methods, 12(3): 1550019-1:48. [5] Barzegar F. and Maddipudi S., (1994), “Generating reinforcement in FE modeling of concrete structures”, Journal of Structural Engineering, 120, pp.1656 –1662. [6] Elwi A.E. and Hrudey T.M., (1989), “Finite element model for curved embedded reinforcement”, Journal of Engineering Mechanics, 115:740 –754. [7] OpenMP Application Program Interface Examples, Version 4.0.0, OpenMP Architecture Review Board, November 2013. [8] Femap, Siemens Product Lifecycle Management Software Inc., 2017. [9] ReConAn - Finite Element Analysis Software; v1.00, Institute of Structural Analysis and Seismic Research; National Technical University of Athens, 2010.