c IEEE 2014 978-1-4799-6699-8/14/$31.00
Proceedings of the International Conference DAYS on DIFFRACTION 2014, pp. 209–214
Parallel computing for numerical calculations of step-index optical fibers eigenmodes by collocation method Spiridonov A.O., Karchevskii E.M. Department of Applied Mathematics, Kazan Federal University, Russia; e-mail: sasha
[email protected] We study natural modes of weakly guiding optical fibers. The original problem is reduced to a nonlinear nonselfadjoint spectral problem for the set of weakly singular boundary integral equations. The integral operator is approximated by collocation method. We propose to use the singular value decomposition of the collocation method’s matrix for the initial approximation of eigenvalues. We implement parallel computing technologies (OpenMP and MPI) using a compact supercomputer.
1
Introduction
Weakly guiding optical fibers are representatives of typical optical circuits [1]. In recent years, research on the natural modes of arbitrarily shaped optical fibers has been focused on the development of efficient and reliable computational methods. Many different numerical techniques are applied for computing eigenmodes of optical fibers, namely, Finiteelement, Finite-difference, beam propagation, and spline collocation methods, as well as multidomain spectral approach (see reviews in [2], [3]). The most rigorous efforts were connected with integral equation formulations [4]–[7]. Particularly, the problem on surface and leaky eigenmodes of a weakly guiding step-index optical waveguide was considered in our previous works [8]–[10]. The original problem was reduced to a nonlinear nonselfadjoint spectral problem for the set of weakly singular boundary integral equations. The integral operator was approximated by collocation method and by Galerkin method. The convergence and quality of these numerical methods was proved by numerical experiments. The collocation method demonstrated better speed of convergence. In this work we develop the collocation method. The main difficulty with practical solution of obtained nonlinear spectral problem are calculations of good initial approximations for eigenvalues. We propose to use singular value decomposition of the collocation method’s matrix for the initial approximations. We calculate using parallel computing technologies and a compact supercomputer.
2
Statement of the problem
Let the three-dimensional space be occupied by an isotropic source-free medium. Consider a regular cylindrical dielectric waveguide. The axis of the cylinder is parallel to the x3 -axis, and its crosssection is a bounded domain Ω with a smooth boundary Γ (see Fig. 1). Denote by Ω∞ the unbounded domain of an environment. Let the refractive index be prescribed as a positive real-valued piecewise constant function n independent of the longitudinal coordinate x3 . Suppose that the function n is equal to a constant n∞ outside the cylinder and to a constant n+ > n∞ inside the cylinder. Eigenvalue problems of dielectric waveguide theory [1] are formulated on the base of the set of homogeneous Maxwell equations rot E = −µ0
∂H , ∂t
rot H = ε0 n2
∂E . ∂t
Solutions of (1) that have the form E E (x, x3 , t) = Re (x)ei(βx3 −ωt) H H
(1)
(2)
are called eigenmodes of the waveguide. Each eigenmode is defined by two spectral parameters: ω and β. Positive ω is the radian frequency. Complex β is the propagation constant. In eigenvalue
x2 n=n+ Ω x3
x1 Γ
Ω∞ n=n∞
Figure 1: The cross-section of a cylindrical dielectric waveguide.
210
DAYS on DIFFRACTION 2014
problems on eigenmodes of dielectric waveguides it is necessary to calculate such numbers ω and β for which there exist nontrivial solutions of the set of Maxwell equations (1) of form (2). The solutions have to satisfy a transparency condition at the boundary Γ and a condition at infinity. Using the representation of eigenmodes in the form of single-layer potentials the original problem under the weakly guidance approximation was reduced in [8] to a nonlinear spectral problem for an integral operator-valued function. A spline-collocation method was proposed in [10] for numerical solution of this problem, and finally the original problem was approximated by a nonlinear finite-dimensional eigenvalue problem of the form A(λ, χ)u = 0.
Initial approximations for eigenvalues
Any iterative numerical method for computations of the nonlinear eigenvalues χ and λ needs good initial approximations. Usually initial approximations are chosen by physical intuition using a prior information on solutions. If we model fundamentally new types of waveguides or investigate defects in fibers, then we do not have any prior information on solutions. In this case we can investigate spectral properties of the matrix A as a function of variables χ and λ. For each given point in an investigated domain of parameters χ and λ we calculate the condition number of matrix A: ρ1 cond(A(λ, χ)) = , (4) ρn where ρ1 and ρn are maximal and minimal singular values of matrix. If χ and λ are equal to nonlinear eigenvalues of A, then the condition number is equal to infinity. Therefore the numbers χ and λ for which the condition number is big enough are good approximations for eigenvalues. Singular values are calculated by singular value decomposition method (SVD): A(λ, χ) = U SV,
S = diag(ρ1 , ρ2 , ..., ρn ),
Geometry and Collocation points -
For λ For χ A = A(λ, χ)
SVD and cond(λ, χ)
Local maximums of cond(λ, χ) for given λ Save initial approximations
(3)
p Here χ = k 2 n2∞ − β 2 is the transverse wavenum√ ber in the cladding of waveguide, k = ǫ0 µ0 ω is the longitudinal wavenumber, λ = k 2 (n2+ − n2∞ ) is the normalized and squared frequency. We numerically solved problem (3) in [10] by residual inverse iteration method. 3
Start
End
Figure 2: The sequential algorithm for numerical calculations of local maximums of condition number cond(λ, χ) as a function of variables λ and χ. where U , V are unitary matrices, S is a diagonal matrix, the singular numbers form S. The calculations are based on unitary transformations of the matrix A and therefore are stable. However, if the condition number is calculated for a wide rang of parameters, then a large amount of calculations is needed. Kazan Federal University purchased a compact supercomputer APK-1M according to the program of university development. Now this supercomputer is a base of hardware of Kazan Laboratory of the Supercomputer Modeling. We are programming for this supercomputer and have decided to investigate numerically the problem on calculations of good initial approximations for nonlinear eigenvalues. Using parallel computational technologies we solved this problem directly and without any prior information. 4
Sequential computing
In this section we present our basic sequential algorithm for calculations of local maximums of condition number as a function of variables λ and χ. Recall that λ is positive; χ is complex for leaky waves and pure imaginary for surface waves. The first step is the definition of the waveguide’s geometry and calculations of collocation points (see (5) Fig. 2). The loop for λ is basic. For each given λ
211
DAYS on DIFFRACTION 2014
Start Geometry and Collocation points -
For λ
For χ -
MPI For χ A = A(λ, χ)
OpenMP
SVD and cond(λ, χ)
Local maximums of cond(λ, χ) for given λ Save initial approximations
End
Figure 3: The parallel algorithm for numerical calculations of local maximums of condition number cond(λ, χ) as a function of variables λ and χ.
The loop of λ is sequential (see Fig. 3) because we want to make parallel computations for any specified λ. Hence we parallelize our algorithm for variable χ. Firstly, we divide the computational domain of χ for four or for eight subdomains according to the number of using processors. At this level of parallelization we use MPI. Secondly, we divide each subdomain of variable χ for a fixed number of sub-subdomains according to the number of OpenMP threads. We use OpenMP also for parallel computations of entries of the matrix A and for singular value decomposition. Let us describe as we use some functions of MPI. We use MPI_Bcast to dispatch the variables N , m, and λ to all processes: MPI_Bcast(&N,1,MPI_INT,0,MPI_COMM_WORLD); MPI_Bcast(&m,1,MPI_INT,0,MPI_COMM_WORLD); MPI_Bcast(&lambda,1,MPI_DOUBLE,0, MPI_COMM_WORLD); Note here that N is the number of collocation points, m is the number of nodes in the meshed subdomain of χ for each process. Then we use MPI_Scatter to scatter meshed subdomains of the variable χ for all processes:
MPI_Scatter(re_chi,m,MPI_DOUBLE, we make calculations in the inner loop for χ. For local_re_chi,m,MPI_DOUBLE,0, given λ and χ we calculate the matrix A. Then MPI_COMM_WORLD); we make SVD and calculate cond(A(λ, χ)). The MPI_Scatter(im_chi,m,MPI_DOUBLE, next step for given λ is numerical calculations of local_im_chi,m,MPI_DOUBLE,0, local maximums of condition number as a function MPI_COMM_WORLD); of variable χ. Finally for each given λ we save local maximums of condition number as initial approxi- Finally, we use MPI_Gather to gather computed mations for nonlinear eigenvalues χ. condition numbers in one array from all processes: 5
Parallel computing
In this section we present a parallel modification of our basic algorithm for calculations of local maximums of condition number as a function of variables λ and χ. APK-1M compact supercomputer is a cluster with two computational nodes. It has eight processors. Each computational node has four processors. Each processor quadruple share a half of all main memory. The main memory is uniformly distributed for two computational nodes. Therefore we use both MPI and OpenMP technologies. Message Passing Interface (MPI) is a technology for multiprocessor systems with distributed memory [11]. Open Multi-Processing (OpenMP) is a technology for computers with shared memory [12].
MPI_Gather(local_condition,m,MPI_DOUBLE, condition,m,MPI_DOUBLE,0, MPI_COMM_WORLD); Each process runs on its own processor. Each processor has sixteen computational cores with shared cash memory. For parallel computations on each processor we use sixteen OpenMP threads for each process. We use the following OpenMP directive for loops of variable χ: #pragma omp parallel for private(i) for(i = 0; i < m; i++) { int idxThread = omp_get_thread_num(); ... }
212
DAYS on DIFFRACTION 2014
To control the sub-subdomain of χ which was dedi- variable χ by a 48-by-48 grid: Reχ = 0 : 0.104 : 5, cated to the thread and to control the special mem- Imχ = −5 : 0.104 : 0. The results for N = 100 are ory array which was dedicated to the thread we use presented in this table: the function omp_get_thread_num(). This funcTechnologies Time Proces- MPI OMP tion returns the number of the thread. We also use and hardware (sec.) sors proc. threads #pragma omp parallel for directive for loops of 8 PC: OpenMP 390 1 – indexes of matrix elements and for loops in SVD. 6
Comparison of computational technologies and hardware
APK-1M: OpenMP
78
4
–
64
APK-1M: OpenMP+ MPI (1CN)
78
4
4
64
We started our numerical experiments using a home personal computer with Intel Core i7 processor (2.90 GHz, 4 physical cores). We fixed λ = 10 We compare the time of computations for different and meshed the domain of the variable χ by a 20- computational technologies and hardware: by-20 mesh: Reχ = 0 : 0.25 : 5, Imχ = −5 : 0.25 : 0. • for OpenMp computations using a home PC In these experiments the number N of collocation with Intel Core i7 processor, points was equal to 100. The results are presented • for the same computations using APK-1M, in this table: • for OpenMP and MPI computations using Computational technologies Time (sec.) APK-1M as we described in previous section. C 162 We see that parallel programs for APK-1M work C+OpenMP(A) 95 five times faster than the OpenMp program for home PC. Note that we used only one computaC+OpenMP(χ) 46 tional node for these experiments. If we use two C+OpenMP(χ)+OpenMP(A) 42 nodes then supercomputer works ten times faster In the table we compare the time of computation than the home PC. for different computational technologies: 7 Numerical results • for the usual C sequential program, • for parallel computing for elements of A, In this section we present some numerical results for • for parallel computing for matrix elements and the 48-by-48 mesh of complex χ and for a 150-nodes mesh of positive λ. The figures presented at the on the mesh of χ. conference were animated by the change of λ. Fig. 4 We see that the last parallel program works four shows a frame for λ = 8.4 of this movie. At the times faster than the sequential program. upper left corner of the figure we present the surface Our numerical experiments we continued using of the inverse condition number function of complex APK-1M. The abbreviation means a modification variable χ. At the upper right corner we preset of the first hardware-software complex. It was the isolines of this function. At the bottom of the made in Sarov. Let us describe some characteristics figure we preset initial approximations to nonlinear of the compact supercomputer APK-1M: eigenvalues χ for λ up to 8.4. In Fig. 5 we present initial approximations to • the peak performance is 1.075 TFlops, nonlinear eigenvalues χ for λ from 0 to 31. Com• the main memory is 1024 GB, plex eigenvalues χ are satisfied to leaky eigenwaves. • the disc storage is 18 TB. The numerical results were obtained for a dielectric As we have told, it is a cluster with two computa- waveguide of the circular cross-section. The exact tional nodes based on the motherboards H8QG6-F. solution for this case is well known. So we comEach computational node has four processors AMD pare obtained numerical results with the exact soOpteron 6272 with sixteen computational cores. lution. The exact solution is plotted by blue solid lines (for Imχ) and by blue pecked lines (for Reχ). Therefore one computational node has 64 cores. For numerical experiments with the supercom- SVD results are marked by small red squares. Note puter we fixed λ = 10 and meshed the domain of the here that squares are the initial approximations
213
DAYS on DIFFRACTION 2014
1/cond(A(χ)), λ = 8.4
5
0
4
0 0 Imχ −5 0
Reχ
Im χ
Imχ
1/cond((A)
0.05
Reχ
Imχ
2 1
5
−5 0
Reχ
5
0 0
5
10
15 λ
5
−5 0
20
25
30
Figure 6: Initial approximations to pure imaginary eigenvalues χ satisfying to surface eigenwaves for a waveguide of the circular crosssection. The exact solution is plotted by solid lines. SVD results are marked by squares.
0
5
10
15 λ
20
25
30
Figure 4: Initial approximations to nonlinear eigenvalues χ for a waveguide of the circular cross-section for given λ. Re χ
3
5
5
Im χ Re χ
0
4 3 2
Im χ
1
−5 0
10
20
30
40
50
0
λ
−1
Figure 7: Dispersion curves for surface and leaky eigenwaves of the circular waveguide. The exact solution is plotted by solid lines and by pecked lines. The numerical solutions obtained by the residual inverse iteration method are marked by circles.
−2 0
5
10
15 λ
20
25
30
Figure 5: Initial approximations to complex eigenvalues χ satisfying to leaky eigenwaves for a waveguide of the circular cross-section. The exact solution is plotted by solid lines (for Imχ) and by pecked lines (for Reχ). SVD results are marked by squares.
eigenvalues χ. In Fig. 7 we present some dispersion curves for surface and leaky eigenwaves of the circular waveguide. The exact solution is plotted by solid lines and by pecked lines. The numerical only. They are start points for inverse iterations solutions obtained by the residual inverse iteration which we use for numerical solution of the nonlin- method are marked by circles. In Fig. 8 we present ear eigenvalue problem. some isolines for leaky eigenwaves of the circular In Fig. 6 we preset initial approximations to pure waveguide. imaginary eigenvalues χ satisfying to surface eigenwaves. We compare numerical results with the exact solution for the circular waveguide. The exact 8 Conclusion solution is plotted by solid lines. SVD results are marked by squares. As at the previous figure the Numerical experiments showed practical effectivesquares are the initial approximations only. ness of our approach to use SVD of collocation We used these initial approximations as start method’s matrix for the initial approximation of points for a residual inverse iteration method. Us- nonlinear eigenvalues. Our software package can ing this iteration method for each given λ we solved be used for numerical simulations of new type’s opnumerically the nonlinear eigenvalue problem on tical fibers on APK-1M supercomputer.
214
DAYS on DIFFRACTION 2014
λ=10, χ = 4.3−1.2i 1
0
−1 −1
[5] Kartchevski, E. M., Nosich, A. I., Hanson, G. W., 2005, Mathematical analysis of the generalized natural modes of an inhomogeneous optical fiber. SIAM J. Appl. Math. Vol. 65, No. 6, pp. 2033–2048.
0
0
1
−1 −1
λ=10, χ = 3−0.9i 1
0
0
0
0
1
λ=10, χ = 1.4−0.4i
1
−1 −1
Helmholtz operator on the plane. Differential Equations. Vol. 36, No. 4, pp. 631–634.
λ=10, χ = 2.1−1.1i
1
1
−1 −1
0
1
Figure 8: Isolines for leaky eigenwaves of the circular waveguide. Acknowledgements This work was funded by the subsidy allocated to Kazan Federal University for the state assignment in the sphere of scientific activities, and also supported by RFBR and by Government of Republic Tatarstan, grant 12-01-97012-r povolzh’e a.
[6] Frolov, A., Kartchevskiy, E., 2013, Integral equation methods in optical waveguide theory. Springer Proceedings in Mathematics and Statistics. Vol. 52, pp. 119–133. [7] Karchevskiy, E., Shestopalov, Y., 2013, Mathematical and numerical analysis of dielectric waveguides by the integral equation method. Progress in Electromagnetics Research Symposium, PIERS 2013 Stockholm, Proceedings, pp. 388–393. [8] Karchevskii, E. M., 1999, Analysis of the eigenmode spectra of dielectric waveguides, Computational Mathematics and Mathematical Physics. Vol. 39, No. 9, pp. 1493–1498. [9] Karchevskii, E. M., 2000, The fundamental wave problem for cylindrical dielectric waveguides. Differential Equations. Vol. 36, No. 7, pp. 1109–1111.
[10] Spiridonov, A. O., Karchevskiy, E. M., 2013, Projection methods for computation of spectral characteristics of weakly guiding opti[1] Snyder, A. W., Love, J. D., 1983, Optical cal waveguides. Proceedings of the InternaWaveguide Theory. Chapman and Hall, Lontional Conference Days on Diffraction 2013, don. DD 2013, pp. 131–135. [2] Horikis, T. P., 2013, Dielectric waveguides of arbitrary cross sectional shape, Applied Math- [11] Gropp, W., Lusk, E., Doss, N., Skjellum, A., 1996, A high-performance, portable implemenematical Modeling, Vol. 37, No. 7, pp. 5080– tation of the MPI message passing interface 5091. standard. Parallel Computing. Vol. 22, No. 6, [3] Xiao, J., Sun, X., 2010, Full-vectorial mode pp. 789–828. solver for anisotropic optical waveguides using multidomain spectral collocation method, [12] Yang, L. T., Guo, M., 2006, High-Performance Computing: Paradigm and Infrastructure, Optics Communications, Vol. 283, No. 14, John Wiley & Sons, Inc. pp. 2835-2840.
References
[4] Karchevskii, E. M., Solov’ev, S. I., 2000: Investigation of a spectral problem for the