FPGA-accelerated Adaptive Optics Wavefront Control Part 2 S. Maucha , A. Bartha , J. Regera , C. Reinleinb , M. Appelfelderb and E. Beckertb a Control
Engineering Group, Computer Science and Automation Department, Technische Universit¨at Ilmenau, Helmholtzplatz 5, 98693 Ilmenau, Germany
b Fraunhofer
Institute for Applied Optics and Precision Engineering (IOF), Albert-Einstein-Str. 7, 07745 Jena, Germany ABSTRACT
We present progressive work that is based on our recently developed rapid control prototyping system (RCP), designed for the implementation of high-performance adaptive optical control algorithms using a continuous deformable mirror (DM). The RCP system, presented in 2014, is resorting to a Xilinx Kintex-7 Field Programmable Gate Array (FPGA), placed on a self-developed PCIe card, and installed on a high-performance computer that runs a hard real-time Linux operating system. For this purpose, algorithms for the efficient evaluation of data from a Shack-Hartmann wavefront sensor (SHWFS) on an FPGA have been developed. The corresponding analog input and output cards are designed for exploiting the maximum possible performance while not being constrained to a specific DM and control algorithm due to the RCP approach. In this second part of our contribution, we focus on recent results that we achieved with this novel experimental setup. By presenting results which are far superior to the former ones, we further justify the deployment of the RCP system and its required time and resources. We conducted various experiments for revealing the effective performance, i.e. the maximum manageable complexity in the controller design that may be achieved in realtime without performance losses. A detailed analysis of the hidden latencies is carried out, showing that these latencies have been drastically reduced. In addition, a series of concepts relating the evaluation of the wavefront as well as designing and synthesizing a wavefront are thoroughly investigated with the goal to overcome some of the prevalent limitations. Furthermore, principal results regarding the closed-loop performance of the low-speed dynamics of the integrated heater in a DM concept are illustrated in detail; to be combined with the piezo-electric high-speed actuators in the next step. Keywords: adaptive optics, SHWFS, rapid control prototyping, FPGA, deformable mirror, high power laser processing, performance examination, Linux real-time system
1. INTRODUCTION The development of fast real-time control for adaptive optics is demanded for the accurate compensation of atmospheric turbulence and other degradations. Considering increased turbulence boosts the need for higher control rates and better control accuracy. In order to enable straight forward integration of different deformable and tip-tilt mirrors for control purposes while avoiding tedious and time-consuming work for preparation of a setup, the use of a rapid control prototyping (RCP) setup is crucial. In this way, after characterizing the dynamics of selected mirrors by a scanning vibrometer, different control strategies may be tested directly after synthesis and simulation. Testing and optimizing these algorithms is done on the RCP setup such that implementation effort may be reduced significantly. Fig. 1 shows the principal concept of the RCP approach with an optical breadboard and an FPGA-based RCP system integrated in a hardrealtime Linux system. The FPGA-based controller offers real-time wavefront evaluation and input measurement as well as generation of the output signals for control purposes. Further author information: Send correspondence to Steffen Mauch (E-mail:
[email protected])
1
In our first paper,1 we presented the concept and some arguments for the use of an RCP system. In the last year, we have tested the setup and gained new valuable information to further improve the system, presented in this paper. Further, we have tested the setup resorting to different sample mirrors, mirror (A) and mirror (B), in order to evaluate its versatility for adaptive optics control-loop development. Sample mirror (A) is a piezoelectric activated unimorph setup, whereas sample mirror (B) is a thermally-activated, thermal-piezoelectric deformable mirror (TPDM). The mirrors feature a continuous surface but differ in its characteristic actuator influence functions (AIF).
Figure 1: Principal concept of RCP approach - Deformable Mirror with compact control loop1 The paper is organized as follows: Section 2 presents the algorithms applied for the wavefront evaluation used on the FPGA. It also provides detailed information on its benefits and limitations as well as information on recently developed enhancements. Section 3 contains a study of the achieved performance in terms of matrix multiplication and resulting delay, admitting to classify and identify the system. Section 4 shows results that allow to compare novel with former results in performance of the overall system. Eventually, Section 5 is devoted to the conclusions and an outlook to some forthcoming works.
2. FPGA WAVEFRONT EVALUATION 2.1 Idea The main idea behind the development of the RCP presented in1 is to decrease the latency as far as possible while guaranteeing deterministic runtime for supporting a better control performance and keeping the benefits of the prototyping capabilities. Therefore, data from a Shack-Hartmann Wavefront Sensor (SHWFS) was evaluated by using an FPGA so as to reduce the transmission delays and overall delay/latency. Also, a new method for the analysis of the camera image was developed for the further improvement of the real-time performance of the SHWFS data evaluation.2 The main idea behind this recent approach is that the camera image no longer is segmented in predefined areas, depending on the grid of the lenslets. Instead the connected-component labeling (CCL) method is used for the evaluation of the resulting lenses. CCL is some sort of applied graph theory and may be implemented as a single-pass algorithm which means that the labeling step requires only one cycle without the requirement of saving the camera image for a second pass. The main benefit is that except for the time required for transmitting the pixel data from the camera to the FPGA additional time is necessary. The time line of this approach, including the required exposure time, is shown in Fig. 2. 2
time in [µs] 100
200
300
400
500
600
700
800
900 1000 1100
unchangeable Exposure Time CameraLink CCL centroid calc. ordering x ordering y segmentation x segmentation y matrix sorting Figure 2: Time evolution (each step rounded up to 25 µs) during image processing The required time is subdivided into the following sections: exposure (100 µs), processing (675 µs), and evaluation (225 µs). We use a standard camera IMPERX Bobcat B06020 available from Imagine Optics as HASOTM 3 Fast SHWFS sensor. In view of Fig. 2, it is clear that when decreasing the framerate, also known as frames per second (FPS), to a value smaller than 1000 Hz we may obtain exactly one frame delay for control purposes assuming the controller is executed with the same frequency and phase. In this case, the evaluation and transmission is shorter than the time available until the next triggering. Otherwise, the delay may be two frames. However, due to the fact that the SHWFS has a maximum framerate of 905 Hz depending on the trigger mode, the delay may be exactly one frame except for the case when the controller rate is not an integer multiple of the framerate. The Bobcat B0620M camera may be configured either in free-running mode, i.e. the camera determines its framerate, or in the triggered mode. In the triggered mode, different kinds of trigger inputs can be selected, e.g. external, computer or software trigger. The computer trigger evaluates one input of the CameraLink interface and for this reason, the synchronization between camera and real-time program is achieved. The real-time program writes into a special register of the PCIe card which the FPGA translates into the correct signal to be finally transmitted over CameraLink. Furthermore, the behavior of the trigger mode is selectable. In standard mode the camera idles and waits for the trigger signal. Upon receiving the trigger, the register is cleared; the exposure begins and afterwards the transmission takes place, again see Fig. 2. In the fast synchronized triggering mode, the exposure is started some specified time after the triggering pulse, but the previous collected data are transferred directly after the trigger event. This mode allows higher framerates when the exposure time is comparably high and may be used to minimize the delay when the delay is no multiple of the framerate. The main benefit of the CCL-approach is the parallel determination of the centroids and the data transfer of the camera pixel stream. Only the last step of the division has to be processed after finishing the transmission of the pixel stream; this delay may be minimized further.2 The subsequent step is the ordering and segmentation of the blobs/centroids such that each blob is assigned to the corresponding lenslet to be able to calculate the slopes and reconstructing the wavefront.2 The last step is writing the centroids as a matrix into a FIFO∗ . It is required to realize the transition between the camera clock and the clock of the PCIe endpoint which have different clocking rates. Then, the PCIe endpoint may transmit the data via DMA† without any further delay. ∗
FIFO is an abbreviation for First In, First Out. It is a method for organizing and manipulating a data buffer where the oldest (first) entry is processed first. † Direct memory access (DMA) allows writing to specified memory region without intervention of the processor.
3
2.2 Enhancement The afore-discussed evaluation has been further revised by improving the ordering of the centroids regarding the lenslets.3 Instead of segmenting along straight lines, a modified spiral algorithm has been applied for the segmentation. The algorithm4 has advantages due to exploiting the similarity for assigning the spots correctly to the lenslets. An overview over different ordering algorithms has been published recently.5 However, these algorithms are not real-time capable, computationally intensive and in addition, most of them even are not even deterministic. In view of its characteristics and performance, we selected the spiral algorithm and developed a modified spiral algorithm that is real-time capable and deterministic in its run-time behavior.3 Furthermore, due to the typically Gaussian intensity profile—see e.g. Fig. 3a for an SHWFS image captured in our optical setup—the centroids/spots at the corner of the image are not as bright as in the center. Additionally, the border of the intensity pattern of a spot in the middle may show similar brightness as a centroid/spot at the corner, resulting from the spatial sampling by the lenslet array. Global thresholding with default values may thus result in missing the spots at the corner, see Fig. 3b. Furthermore, when the threshold is chosen very low, non-existing spots may be detected as centroids by the CCL which will lead to failure of the spot sorting algorithm.
(a) measured SHWFS camera image
(b) global threshold
(c) adaptive threshold
Figure 3: difference between global and adaptive thresholding when Gaussian intensity profile is prevalent In the field of image processing, several adaptive thresholding methods have been proposed.6 Nevertheless, methods such as adaptive Gaussian thresholding work properly for background removal only on, at least, locally uniform backgrounds. In our case, removing the background does not solve the problem. Therefore, our approach is to approximate the inverse of the global intensity profile (ideally, an inverse Gaussian distribution) during the thresholding. Thus, centroids at the corner may also be detected without a globally low threshold value, see Fig. 3c. The threshold value applied to Fig. 3b is identical to the value in the middle of Fig. 3c. For transmitting the centroid data to the real-time program on the performance computer, the developed FPGA-card is connected via PCIe 2.0 x4. Through this high-speed interconnection, the raw centroids, ordered/segmented centroids or even the complete image may be transferred (via DMA) into the main memory of the computer without minimum delay and no additional effort for the central processing unit (CPU). The DMA has been realized using the so-called busmaster application of the PCIe FPGA card, being triggered by internal signals which are set when the individual calculation or receiving frame has been completed. As mentioned, a new frame of the SHWFS is started by generating a trigger signal; the real-time program determines the framerate of the SHWFS up to 900 Hz by appropriate generation of the trigger signal. Due to this mechanism, the controller of the closed-loop is synchronized to the camera with the consequence that time-varying delays,7, 8 caused by different sampling rates and unknown phases between the sampling frequency,
4
are diminished. This reduction enhances the performance and simplifies the design of robust controllers that may stabilize the system also when subject to model uncertainties. The close integration of the PCIe FPGA card into the real-time Linux further supports the development of new approaches, e.g. for the segmentation or the blob detection, which may now be tested easily in software easily. In this regard, one possibility is to use a Simulink model and employ the Simulink Coder for direct C/C++ code generation or any other user-space program. This is possible because the individual measurement values are either available directly in Simulink via Comedi‡ or via the ‘/proc’ filesystem for an easy access in a terminal session. Once the new algorithms are evaluated successfully, in a further step, the algorithms may be implemented directly on the FPGA of the PCIe FPGA-card for using the hardware acceleration for reducing latency/delay.
3. BENCHMARK The performance computer is based on an Intel Core i7 4771 (Haswell architecture) mounted on an Asus H87Pro main-board and runs Ubuntu 14.04 LTS with Linux kernel 3.10.18 patched with RTAI 4.0 as an operating system. Several performance tests have been conducted on it. The following performance test results confirm the efficiency of the overall system and show that it is capable to handle large controller orders without decreasing performance or latency.
3.1 Matrix multiplication Fig. 4 depicts the time required for matrix multiplication of square matrices of different dimension. The usual complexity of a matrix multiplication is O(n3 ) where n is the dimension of the matrix. There are also algorithms such as the Strassen algorithm O(n2.807 ) or the optimized CW-like algorithm9 having complexity of O(n2.373 ), but these have in general special requirements e.g. regarding the dimension n and/or are not easy to implement. The schoolbook matrix multiplication itself requires n3 multiplications and (n − 1)n2 additions. Fig. 4 demonstrates that MATLAB is able to outperform the implemented stand-alone matrix multiplication for huge dimensions. This is due to the fact that MATLAB is using the Intel® Math Kernel Library (MKL)§ which is highly optimized to utilize all available capabilities of Intel processors. For lower dimensions, the overhead from Matlab tampers with the results because only the tic/toc commands from Matlab have been used for measuring the consumed time. Both other variants have been coded in C++ using Eigen3 which is a C++ template library for linear algebra that was optimized for using the available extensions of the processor. When running the Advanced Vector Extensions 2 (AVX2) and Fused-Multiply-Add (FMA) extensions the performance is significantly better. Note that the Haswell architecture is the first generation of Intel processors supporting AVX2 and FMA. In the figure, the error bars represent minimum and maximum values of the required run-time. Performing matrix multiplications in an efficient way is an unconditional need when controlling deformable mirrors. Within this context, the so-called influence matrix carries on its columns the influence functions with respect to the individual actuators of the DM, i.e. the deformation that one single actuator produces. During the control it is important to know which actuator should be deflected in order to let address the inverse problem. More precisely, the Moore-Penrose pseudo-inverse is precomputed and the actual measurements of the SHWFS are multiplied with the pseudo-inverse so as to receive the required voltages.10 When a transformation into zonal coordinates is applied, even more multiplications are required. When using the Simulink Coder for compiling a Simulink model with a matrix multiplication into C/C++ code the result is likely to be disappointing. The Simulink Coder does not use vectorization, i.e. the SSE/2/3/4 ‡
Comedi is a collection of drivers for measurement devices; drivers are implemented as a Linux kernel module providing common functionality and being real-time capable. § Intel® Math Kernel Library, fastest and most used math library for Intel and compatible processors, https:// software.intel.com/en-us/intel-mkl/
5
50x50
100x100
200x200
1,417
in µs
1,500
1,000
451
551
500 164 23 0 Eigen3 (SSE3)
10
59
13
67
Eigen3 MATLAB (AVX2+FMA3) (2013b single thread)
Figure 4: Matrix multiplication benchmark for different matrix dimensions; average runtime with 10000 runs in total; stand-alone C++ program, compiled with GCC 4.8.1 and Matlab R2013b running on Linux (same system configuration) and AVX/2 instructions sets are not employed for accelerating the computation. Even equipped with an adequate processor, the highly optimized Matlab implementation is not used when generating C/C++ code with the Simulink Coder. The automatic vectorization support, which is available e.g. when using GCC¶ , usually is also not able to vectorize the code because no additional information containing alignment and ordering is specified. To overcome this limitation and having a good performance with the RCP approach, we used the Eigen3 math library to write a Simulink s-function in C++ which accelerates the matrix multiplication by replacing the non-vectorized C code from Simulink Coder resulting from the C/C++ code generation step. By using Eigen3, the effort for that task is quite small. This comes with the result that even calculations, invoking 200×200 matrix multiplications, are rendered possible within a loop larger than 1000 Hz, see Fig. 4. Note that the comparison in Fig. 4 resorts to the stand-alone implementation in C++ without using Simulink Coder. For demonstrating the efficiency when using the Simulink Coder, a Simulink model has been developed which multiplies a 50 × 50 matrix. When using the default matrix multiplication, the time for executing one task step was approximately 140 µs, whereas when using the Eigen3 based multiplication, it was approximately 50 µs, only. These values are distinctively larger than the values in Fig. 4 which is due to the included scheduler overhead and other functionality, among others.
3.2 State-Space evaluation
¶
D
In7 we proposed the H∞ synthesis method to devise a state-space controller of predefined fixed order. This order depends on the number of actuators and outputs, and also on the plant. When using a PI(D)-controller instead, these controllers usually are given in terms of a transfer function in frequency domain. Equivalently, these transfer functions can also be represented as state-space models in the time domain. The GNU Compiler Collection (GCC) is a compiler system produced by the open-source GNU Project supporting various programming languages. Originally, GNU C Compiler was named GCC when handling C programming language, only.
6
Using the state-space representation has some computational benefits, such as better numerical stability. Additionally, these models may be discretized easily by e.g. using the rectangle approximation when the chosen sampling rate is sufficiently fast. The state-space equation can be written as follows x(t) ˙ = Ax(t) + Bu(t),
x(t0 ) = x0
(1)
y(t) = Cx(t) + Du(t) where x(t) ∈ Rn are the states, y(t) ∈ Rp the output and u(t) ∈ Rq the input. Using Eigen3 for reducing the time consumption of the calculation, as previously done for the matrix multiplication case, also results in an improvement, but with a less pronounced difference in time due to the lower complexity of the required calculations. The number of required multiplications is O(n2 + q · n + n · p + p · q) while requiring O(n · (n − 1) + n · (q − 1) + p · (n − 1) + p · (q − 1)) additions, in general. Equation (1) can be divided into several matrix multiplications dependent on the dimension of x(t) and u(t). For example, when n = 48 and q = 24, Bu(t) is a matrix multiplication of dimensions Rn×q by Rq×1 . The corresponding C++ code, an example Simulink model for the matrix multiplication, and the state-space implementation is available at https://github.com/steffenmauch/ Simulink-Eigen3.
3.3 Resulting delays In this subsection, delays associated to the individual component are specified and discussed. For synthesizing a robustly stabilizing controller, disposing of reliable information about the delay is indispensable. 3.3.1 SHWFS evaluation Fig. 2 makes clear that the required time for the SHWFS evaluation is given as 1000 µs. Of course, the time delay from the exposure of the pixel may be reduced by either increasing the intensity of the separated beam or by using an other camera with higher photon efficiency. The ‘unchangeable’ bar in Fig. 2 should be understood such that without changing the SHWFS itself or the optical components, the processing time is immutable. 3.3.2 DMA transfer of centroid data The payload size of the transaction layer packet (TLP) that the chipset of the main-board is capable allows only a maximum TLP size of 128 bytes. Since each TLP requires an individual header, writing to a 32-bit data address in the main memory outpace only 116 bytes for the user-data. Therefore, the actual bandwidth is only 116 128 ≈ 90% of the burst-rate without the PCIe protocol overhead and without any other disturbance. Hence, transferring the data for the centroids or the whole image itself requires the time given in Tab. 1, coincident with measurements of the corresponding signals within +5 %. Table 1: Required time for data transfer between PCIe and main memory via DMA data transmission centroids image
# of data
req. time PCIe 1.0 x4 1.08 µs 106 µs
1024 byte 100352 byte
7
req. time PCIe 2.0 x4 0.54 µs 53.1 µs
3.3.3 Real-Time System Latency RTAI, also including the Linux Kernel, contributes some latency, as well. This latency is load dependent. Additional interrupts which have a higher priority than the real-time program itself contribute also to the latency, e.g. non maskable interrupts of the power management. The mean latency over one execution combined with the maximum and minimum (color blue), and also its absolutely worst case (red color) is depicted in Fig. 5. The latency has been determined while the DMA of the PCIe FPGA card was activated and the processor forced to 100% load at all available cores. Furthermore, context switches and X11 server activity has been done in 5.2 0.38
RTAI latency 0
1
2
3 4 5 6 time in µs Figure 5: Latency of RTAI task execution at 10 kHz rate of the LXRT task the meantime, to cover the worst case, as well. Altogether, the test was carried out over two hours in total. Note that this latency may be considered critical because it influences the integration step as well as it directly propagates to the SHWFS as the real-time program triggers the start of a new frame.
4. RESULTS 4.1 Sample mirror (A) The sample mirror (A) is a unimorph mirror with a 2 mm-thick polished glass substrate (B33) and an adhesively bonded piezoelectric disk (PIC 151, PICeramic GmbH) with 400 µm thickness. The top electrode of the piezoelectric disk features 16 actuators outside the aperture and 24 actuators inside the aperture. Fig. 6b shows the pie-slice actuator setup. This mirror is electrically contacted and mounted via 20 compliant cylinders on the mirror’s rear surface. Each actuator may be activated with 2 kV/mm, resulting in 800 V. 4.1.1 Piezoelectric characterization of the sample mirror (A) Fig. 6a shows the static actuator influence of the individual actuators and their influence on neighboring actuators, Fig. 6b shows the actuator arrangement and the pattern numbering. In Fig. 6a, the 24 actuators inside the aperture are marked with the white line. The deflection of each actuator has been normalized to unity, thus, the main diagonal is equal to one. This normalization is also done on the identified model because the application of the pseudo-inverse already incorporates the amplitude.10 For a first test, only the inner 24 actuators have been considered for control purposes as they are directly measurable and therefore, easier to control. The first column in Fig. 6a shows the strong coupling of actuator one to actuator two and eight. This is not surprising because they are direct neighbors of actuator one. In general, the local coupling of an actuator to its neighbors is also visible in the figure, but also the global coupling of the actuators; which is superimposed. Note that the secondary diagonal is not parallel because the actuator spacing is not regular. However, it is visible that the direct neighbors are almost identically coupled regardless of its position. Due to the relatively strong coupling, the static influence function does show very small entries. The smallest entry is ≈ 0.2. Thus, when no local decoupling or any other decoupling technique is applied the complexity of the control problem is very high. Neglecting the coupling may degrade the performance or even result in instability. Within decoupling, particular stress has to be put on the the actuation value since the decoupling may exceed actuation limits. Hence, simulations are necessary to show that the specified inputs remain within the limits. For the controller design, we resorted to a H∞ non-smooth µ-synthesis method for obtaining a fixed order H∞ controller that features good properties regarding robustness and performance.7 Applied on the underlying 8
number of actuator
1 0.8
10
0.6 20 0.4 30 0.2 40
10 20 30 number of actuator
0
40
(b) actuator layout - front view
(a) normalized influence function
Figure 6: Visualization of the static influence function and the respective actuator layout of the DM problem, the approach results in a high dimensional synthesis problem whose solution requires several hours of calculation on a state-of-the-art high performance computer. The problem was solved with Matlab and the robust control toolbox (hinfstruct).
DR A
Fig. 7 indicates that the controller is able to tackle the coupling. However, there is some small influence between the individual channels due to model inaccuracy. Within the synthesis of the fixed-order controller (dimension 48) the identified model of the DM (transfer function G(s) ∈ R(s)24×24 ) is supposed to be uncertain, additively perturbed by an uncertain gain bounded matrix (with transfer function ∆ ∈ C24×24 , k∆k∞ ≤ 0.1). The actuator model may thus be written as Gtotal = G + ∆. Since the DM model G is normalized to unity, the uncertainty may be interpreted as a maximum of 10% deviation in the individual actuators.
The operation of the DM has been biased with 150 V, operated between 0 V and 300 V such that positive as well as negative deflection is achievable within the operating range of the voltage supply. The possibility to deflect into both directions is essential to cope with the coupling (see Fig. 8). In view of the maximum voltage of 300 V, the deflection of the individual actuators is rather low (≈ 480 nm) when considered in absolute values. Note that usual H∞ controllers do not have an integral control action. Therefore, a small deviation is always present (non-zero steady-state error). When choosing appropriate weighting matrices for the mixed-sensitivity approach, however, pseudo-integral behavior is achievable. In our case, the deviation is smaller than 0.05% according to Monte Carlo experiments. The consideration of a steady state error is less important here since for a system subject to fast, non-static disturbances a steady-state will not be reached, anyway.
The result is that 90-95% of the disturbance is equalized after approximately 11-13 milliseconds, see Fig. 8 and 7. Faster compensation is possible by choosing other weighting matrices, but a somewhat slower controller has to be tested circumstantially beforehand in order to gain better insight on the accuracy of the DM modeling. Fig. 8 also illustrates that when an error of −0.2 at actuator one is measured, not only an actuation value of 0.2 is required. Instead almost 0.8 is required and for actuator two and eight also more than 0.2. This points out the strong coupling in the compartments of the DM. Thus, the maximum possible compensation when only the disturbance is measured at actuator one is almost exhausted. The remaining actuators are also controlled, but with a very low amount. The jittering is based on the fact that measurement noise is modeled. A simulation study has been undertaken for a varying DM model to examine weather the performance is as desired under the assumed uncertainties. 9
To: Out(1)
1
To: Out(2)
1
To: Out(3)
1
To: Out(4)
From: In(1)
1
From: In(2)
From: In(3)
From: In(4)
0
0
0
0 0
0.01 0.02 0 0.01 0.02 0 0.01 0.02 0 0.01 in seconds in seconds in seconds in seconds Figure 7: Simulated step-response of the closed-loop for actuator one to four
0.02
0
act. 1 act. 2 act. 8
−0.5
0.12
0.13
0.14
0.15
DR
normalized acting value
0.5
0.16 0.17 0.18 0.19 0.2 in seconds Figure 8: Simulated actuation values for a step on actuator one with -0.2 height at 0.15 s; simulated with noise
4.2 Sample mirror (B)
The sample mirror (B) is a TPDM unimorph mirror as presented in.11, 12 The substrate material is lowtemperature co-fired ceramic (LTCC) with a thickness of 240 µm and a fixed rim. A 100 µm-thick copper is applied so as to increase mirrors heat spreading properties. The optical aperture is 22.5 mm, thus identical to sample mirror (A), but the piezoelectric actuator layout is different. However, we use the mirror’s thermal actuation properties. 10
4.2.1 Integrated heaters in the DM The DM is equipped with five integrated heaters and thermal sensors. The arrangement of the heaters is illustrated in Fig. 9. Heater 4
Heater 1
Sensor 4
Sensor 1
Heater 5
Sensor 5 Sensor 3
Sensor 2
Heater 3
Heater 2
Figure 9: Outline of the arrangement of the heaters and temperature sensors The four segment heaters have the feature to control the temperature at each of the four segments, separately. Additionally, a circular heater is used to increase the temperature level of the whole mirror in a uniform way. The temperature may be measured separately by five negative temperature coefficient thermistors (NTC) sensors. Each segment has an own sensor which each is associated to heater one to four. The fifth sensor is placed in the center of the mirror and has the same distance to each heater. The heater of the DM is modeled as a linear time-invariant multiple-input-output (MIMO) state-space system without throughput, recall equation (1). In the heater case, the state x(t) ∈ R5 represents the temperatures at the sensors and u(t) is the square of the input voltage at the corresponding heater. The output y(t) ∈ R4 denotes the temperatures at the sensors one to four. The matrices A, B, and C are of appropriate dimensions. In a first step, a MIMO controller is designed for regulating the mirror to a desired temperature level. For compensation of model uncertainties and external disturbances, the form of equation (1) is extended by an integral term for every output channel. That is ¯ u(t) x ¯˙ (t) = A¯ x ¯(t) + B y(t) = C¯ x ¯(t)
(2)
with A¯ =
A C
05×4 , 04×4
¯= B
B , O4×5
C¯ = C
O4×4
.
A linear quadratic regulator (LQR) has been designed for controlling the system, see.13 The result is the controller gain matrix K ∈ R5×9 parameterizing the state feedback controller u = −K x ¯+F r .
(3)
For commanding desired values for the output temperatures, a reference input r is included in the controller. Under the assumption of a steady state, matrix F may be computed with + ¯ K −1 B ¯ F = −C¯ A¯ − B (4) + ¯ K is invertible where (·) denotes the Moore-Penrose pseudo-inverse of a matrix. Note that the matrix A¯ − B since the LQR design provides a controller matrix K such that the eigenvalues of the closed-loop system have strictly negative real part.
11
Sensor 5 Sensor 4 Sensor 3 Sensor 2 Sensor 1
For controller efficiency, the controller needs to adjust the temperature of the DM to the same value. The heaters are connected to a voltage amplifier that provides up to 100 V output voltage, connected to the real-time system via serial ports. Based on a certain initial temperature of the mirror, the controller needs to regulate the output voltage of the amplifiers such that a desired temperature profile can be achieved on the mirror surface. Therefore, steps of equal height are exerted on the reference signal r of the temperature controller at different time increments. 1 0.8 0.6
reference value actual measurement
1 0.8 0.6 1 0.8 0.6 1 0.8 0.6 1 0.8 0.6 0
1
2
3
4
5
6
7
8
9
10
time in seconds
Figure 10: Time characteristics of steps on the desired output temperature of the closed-loop DM with heaters Experimental results are shown in Fig. 10 which displays the values of all five temperature sensors over time. The temperature values are given in terms of the input voltage from the measurement input of the real-time system. The absolute temperature is not relevant, since mainly the temperature differences affect the deformation of the mirror. Starting from an initial value of 0.5 at every sensor, the desired values are increasing to 1.0, individually. From Fig. 10 it can be drawn that the controller is able to adjust the temperatures without steady-state error. The coupling of the heater segments leads to a slight increase of the temperature at neighboring sensors, e.g. sensor one and four, and a small maximal overshoot of about 8 %. The time for reaching the final value differs: It is about two seconds for the fourth sensor and about eight seconds for the first sensor. On the one hand, this is a result of the thermal interconnection of the mirror. On the other hand, it is caused by the input constraints since a selective cooling of the segments is not possible within the current setup.
5. CONCLUSION
R
An improved thermal model of the system might result in a better representation of the coupling and dynamic behavior of the system. One way to achieve this is to use a higher order in the state-space model. Additionally, a dynamic feed-forward control may help improve the tracking of desired temperature profiles while meeting output constraints. Further investigation is needed to find a model that relates the temperature profile with the mirror surface such that an additional compensation may be performed supporting the piezoelectric actuators.
We have investigated a novel experimental setup for adaptive optics control. As has been demonstrated, the setup shows superior performance with minimal delay/latencies while maintaining the RCP characteristics. By using the single instruction, multiple data (SIMD) capabilities of the Intel processor (e.g. AVX2 and FMA) even huge and complex controllers up to dimension 200 × 200 may be handled in reasonable time. 12
Additionally, we could enhance the evaluation of the SHWFS by exploiting the FPGA implementation. That is, the delay is now less than one frame period for the maximum framerate of the HASOTM 3 Fast SHWFS sensor. For compensating the Gaussian intensity profile of the laser beam, and thus, different intensity levels of the desired centroids/spots, we have applied an approximative inverse of the intensity distribution. Hence, a higher threshold value may be used which may decreased depending on the distance of the specified center. The FPGA based system is now fully functional. It is ready for testing novel wavefront sensing and evaluation approaches, and the integration of different DM. Next steps of investigation are to combine the low-speed thermal actuator with the high-speed piezoelectric actuators in order for maximizing the class of rejectable disturbances and to be able to better for compensate thermal lense effects. The simulation results need to be analyzed further so as to further optimize the controller and data evaluation. Additionally, different DM architectures may be examined. These may show a reduced coupling such that the bandwidth of the closed-loop could be increased further. Finally, elaborate investigations have to be performed so as to assess the robustness of the synthesized controller when subject to hysteresis effects of the piezoelectric actuators.
REFERENCES [1] Mauch, S., Reger, J., Reinlein, C., Appelfelder, M., Goy, M., Beckert, E., and T¨ unnermann, T., “FPGAaccelerated adaptive optics wavefront control,” Proc. SPIE 8978, 1–12 (2014). [2] Mauch, S. and Reger, J., “Real-time spot detection and ordering for a Shack-Hartmann wavefront sensor with a low-cost FPGA,” IEEE Transactions on Instrumentation and Measurement 63, 2379–2386 (10 2014). [3] Mauch, S. and Reger, J., “Real-Time Implementation of the Spiral Algorithm for Shack-Hartmann Wavefront Sensor Pattern Sorting on an FPGA,” Journal of the International Measurement Confederation (submitted) (2014). [4] Smith, D. G. and Greivenkamp, J. E., “Generalized method for sorting Shack-Hartmann spot patterns using local similarity,” Applied Optics 47(25), 4548–4554 (2008). [5] Bedggood, P. and Metha, A., “Comparison of sorting algorithms to increase the range of Hartmann-Shack aberrometry,” Journal of Biomedical Optics 15(6), 067004–067004–7 (2010). [6] Bailey, D. G., [Design for Embedded Image Processing on FPGAs], Wiley-IEEE Press, 1 ed. (8 2011). [7] Mauch, S. and Reger, J., “Application of µ-Synthesis H∞ -Control for Adaptive Optics in Laser Material Processing,” in [Proceedings of 2013 IEEE International Conference on Control Applications (CCA) ], 941– 947 (8 2013). [8] Mauch, S., Reger, J., and Beckert, E., “Adaptive Optics control for laser material processing,” Congreso Latinamericano de Control Autom´ atico (CLCA 2012) (2012). [9] Gall, F. L., “Powers of tensors and fast matrix multiplication,” CoRR abs/1401.7714 (2014). [10] Tyson, R., [Principles of Adaptive Optics, Third Edition (Series in Optics and Optoelectronics)], CRC Press, 3 ed. (9 2010). [11] Gutzeit, N., M¨ uller, J., Reinlein, C., and Gebhardt, S., “Manufacturing and Characterization of a Deformable Membrane with Integrated Temperature Sensors and Heating Structures in Low Temperature Co-fired Ceramics,” International Journal of Applied Ceramic Technology 10(3), 435–442 (2013). [12] Reinlein, C., Appelfelder, M., Goy, M., Gebhardt, S., and Gutzeit, N., “Testing of thermally piezoelectric deformable mirror with buried functionality,” Proc. SPIE 8978, 897804–897804–8 (2014). [13] Kwakernaak, H. and Sivan, R., [Linear Optimal Control Systems], Wiley-Interscience, 1 ed. (10 1972).
13