Software Infrastructure

4 downloads 725 Views 1MB Size Report
4 Requirements of management and monitoring of the infrastructure .... servers for running the tools for the collaborative development and software testing. ... Ubuntu, which itself is a Debian derivative, is the OS on which Tango is mostly used.
ELI-BL-4400-REP-00000143-A

Document No:

00000143

Edition:

A

Process:

REP

No. of pages:

34

Report

ELI Control System Detailed Design Volume I: Software Infrastructure

Approved

Position

Name

Technical Director

Signature

Bruno Le Garrec

Date 02/04/2015

Alejandro Vázquez Otero Josmar Regalado Josef Cibulka

Prepared

Karel Spálenka

CS Software Group

Ond°ej Janda Danila Khikhlukha Ji°í Wiesner Joao Miranda

E

L

I

-

B

L

-

4

4

0

0

-

R

1 | 34

E

P

-

0

0

0

0

0

1

4

3

-

A

ELI-BL-4400-REP-00000143-A

Contents

I REQUIREMENTS

4

1 Introduction

5

2 Requirements imposed by earlier decisions

5

2.1

Control System specics . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2.1.1

Control system

5

2.1.2

Tango requirements

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Functionality required from the infrastructure

6

6

3.1

Software development requirements . . . . . . . . . . . . . . . . . . . . . . .

6

3.2

Software testing requirements . . . . . . . . . . . . . . . . . . . . . . . . . .

6

3.3

Requirements of the software infrastructure tools

. . . . . . . . . . . . . . .

7

3.4

Hardware infrastructure requirements . . . . . . . . . . . . . . . . . . . . . .

7

4 Requirements of management and monitoring of the infrastructure

7

4.1

Management of the infrastructure . . . . . . . . . . . . . . . . . . . . . . . .

8

4.2

Monitoring of the infrastructure . . . . . . . . . . . . . . . . . . . . . . . . .

8

4.3

Documentation of management processes

8

. . . . . . . . . . . . . . . . . . .

II PROPOSALS

10

5 Infrastructure

11

5.1

Open Source and Operating Systems . . . . . . . . . . . . . . . . . . . . . . 5.1.1

Software development

5.1.2

Control system deployment

. . . . . . . . . . . . . . . . . . . . . . . . . .

5.1.3

Debian - CentOS compatibility

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.2

Operating System Virtualization

5.3

Management tools for the base infrastructure

. . . . . . . . . . . . . . . . . . . . . . . .

11 12 13 13 14

. . . . . . . . . . . . . . . . .

15

5.3.1

Automated operating system installation . . . . . . . . . . . . . . . .

16

5.3.2

Operating system maintenance, conguration and customization

5.3.3

Tango control system API binaries

. .

16

. . . . . . . . . . . . . . . . . . .

17

5.4

Monitoring tools for the base infrastructure

. . . . . . . . . . . . . . . . . .

17

5.5

User authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

5.6

Hardware

18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 Development Cycle

22

6.1

Management tools

6.2

Repositories and VCS 6.2.1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

VCS comparison

22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2 | 34

ELI-BL-4400-REP-00000143-A

6.2.2 6.3

Branching scheme

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

Continuous integration approach

24

. . . . . . . . . . . . . . . . . . . . . . . .

26

6.3.1

Unit tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

6.3.2

Continuous integration software.

26

. . . . . . . . . . . . . . . . . . . .

7 Information tools

28

7.1

Database for ocial documents . . . . . . . . . . . . . . . . . . . . . . . . .

28

7.2

Internal informations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

7.3

Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

7.3.1

Doxygen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

7.3.2

Documenting Tango servers using Pogo

28

. . . . . . . . . . . . . . . .

8 Full integration overview 8.1

30

Full integration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

9 Acronyms

31

References

32

3 | 34

ELI-BL-4400-REP-00000143-A

Part I

REQUIREMENTS

4 | 34

ELI-BL-4400-REP-00000143-A

1

Introduction

The overview of the structure of the ELI Control System is described in the Statement of Work WP 4.4 Control System [1]. The Control System software group is in charge of building the supervisory control system, usually known as the central control system or simply

CS.

CCS

It is a high-level supervisory interface intended to provide an interaction

layer between the operator and the logic behind the processes driving the facility. The supporting infrastructure is the collection of hardware and software resources necessary to enable the Control System Software group to fulll its core mission: the production of the software applications needed to control the ELI Beamlines facility. As such, the supporting infrastructure will enable software development and testing in addition to management and monitoring of the infrastructure itself. Functional requirements are a set of specic functionalities that dene what a system is supposed to accomplish. In the case of the supporting infrastructure for software development of the ELI CCS, these requirements can be divided into three layers: 1. HW for supporting the SW infrastructure of the team.

This includes mainly the

servers for running the tools for the collaborative development and software testing. 2. Admininstration of the hardware and monitoring of its status. 3. Tools for software development and testing of the software used for the CCS.

2

Requirements imposed by earlier decisions

This section summarizes decisions that were taken earlier and are relevant for the design of the supporting infrastructure.

2.1 Control System specics A description of the CS can be found in Volume II of this document together with a detailed description of the core technologies that have already been selected based on surveys and evaluations of dierent solutions [2, 3, 4]. Therefore, to support the development of such distributed CS the following conditions must be met:

2.1.1 Control system General requirements for the central control system are: 1. The central control system controls and monitors several local control systems. It must be able to communicate with systems built on a variety of frameworks, such as Tango, EPICS, Tine and LabVIEW. 2. The central control system is composed of many dierent processes some of which may be even running on dierent machines. That is, it is a distributed control system.

5 | 34

ELI-BL-4400-REP-00000143-A

2.1.2 Tango requirements The choice to build the system using Tango inuences the choice of the following: 1. Operating system. Tango core tools and software built using Tango can run on many operating systems. Nevertheless, GNU/Linux and more specically the Debian and Ubuntu, which itself is a Debian derivative, is the OS on which Tango is mostly used and thus best supported. 2. Programming languages.

Programming languages supported by Tango are: C++,

Python, Java. Tools written in all of these languages can coexist in the system and communicate with each other without any problem. For example, the core parts of Tango (the database server) are written in C++ while some tools are in Java (Astor, Jive, Pogo).

3

Functionality required from the infrastructure

The prerequisites and accepted best practices for software development lead us to the following list of requirements that the infrastucture must conform to.

3.1 Software development requirements Software to be produced by the Control System group should be developed with practices consistent with modern industry-standard software engineering. In this regard, the following basic requirements can be identied: 1. Collaborative development with version control of all the software produced. 2. Use of centralized repositories with software project management tools that can be hosted locally. 3. Use of automatic and nightly builds to preserve the code integrity when necessary. The builds include not only compilation of the programs, but also their tests. 4. Use of reporting tools to publish the status of a given software project. 5. Presentation of documentation automatically generated from source codes. 6. Backup of the source codes and other important data.

3.2 Software testing requirements The software developed by using Tango will need special considerations during software testing. 1. A separate instance of a Tango environment dedicated to software testing. 2. In the particular case of some hardware drivers, remote access to the machine to which the hardware is connected may be needed to get ready access to the hardware by the control application.

6 | 34

ELI-BL-4400-REP-00000143-A

3.3 Requirements of the software infrastructure tools 1. Open source. One advantage of open source is that there are no license fees. Another is that the development of such software cannot be suddenly terminated by a decision of a single company. 2. Widely adopted solutions which have a diverse community of users, some of whom are also developers. This, together with an open source license, guarantees that the software will be actively developed and maintained in the near future and most likely in a more distant future as well. The diversity in the user and developer community guarantees that the fate of the software is not subject to the interests of a single person or company. 3. A support period of 2 years or longer for the OS used for software development and 5 years or longer for deployment. during the support period.

Only security related issues are addressed

Most of the software development shall take place in

the ensuing two years (2015 and 2016), hence a 2 year support period is enough to allow the developers to run one particular release version of the selected distribution. Deployment requires a longer support period.

3.4 Hardware infrastructure requirements The base hardware level (sometimes referred to as the metal layer) hosts and supports the software tools that makes the overall infrastructure functional: 1. Servers in which the services of the development infrastructure will be running. 2. The network to connect the servers between themselves and to the workstations of the developers. 3. Network storage space to store the data and for its backups.

4

Requirements of management and monitoring of the infrastructure

Since the base infrastructure consists of both hardware and software (operating systems and applications), it must be deployed, monitored and managed. In order to ensure its integrity (from operational and security perspectives), cost-eectiveness and long-term viability, the management and monitoring of the infrastructure should abide by the principles listed below. Management and monitoring have the following common requirements: 1. Maximum consolidation/centralization of monitoring and management. This means that it should be easy to verify the functioning status of the elements of the infrastructure and to the maximum possible extent, the management or conguration of these should be made through conguration tools capable of intervening collectively on the whole or within parts of the infrastructure. Consolidation of the base infrastructure management and monitoring will thus allow further consolidation of this system into larger ELI systems if required in the future.

7 | 34

ELI-BL-4400-REP-00000143-A

4.1 Management of the infrastructure The chosen solutions must comply with the ensuing requirements: 1. High availability, live service migration. The possibility to migrate a running base infrastructure service instance from one physical server to another. Such a requirement makes the infrastructure exible, robust and able to withstand minor hardware failures while providing uninterrupted service. 2. A unied environment inside which the base infrastructure services operate. 3. Scalability. Implemented solutions should permit the growth of the infrastructure itself with minimal increase in the monitoring and management burden for the personnel. This means that uniformity of the used tools and procedures is highly desirable. 4. Maximum automation of maintenance activities.

Certain activities which demand

a prompt response from the system administrators, such as deploying certain software upgrades or security patches should be dealt with through automatic processes. Backing-up of key information should be automatically done. 5. Backup of the conguration settings of the tools. 6. The infrastructure tools must support access restrictions and authentication.

For

example, most of the tools will be available for use by all the developers, but their conguration will be restricted only to assigned administrator(s). Not a requirement, but a useful feature is when the tools allow authentication using ELI domain login credentials.

4.2 Monitoring of the infrastructure It is necessary to monitor the hardware and software to ensure early detection and analysis of problems. Careful monitoring can often even warn in advance about hardware failures, most importantly of hard disks and memory. When a problem occurs, it is necessary to have all the information needed to correctly analyze the problem and identify the source of the problem. Thus the monitoring tools must: 1. Monitor the load on the servers and workstations. 2. Monitor services running on the servers. 3. Collect and store all monitoring data. 4. Display the history of the monitored data to help in the identication and analysis of the source of a problem.

4.3 Documentation of management processes Optimal levels of process documentation. The infrastructure and its monitoring and management procedures should be well-dened, repeatable and documented. This will permit the continual improvement of the procedures themselves, the training of new personnel

8 | 34

ELI-BL-4400-REP-00000143-A

tasked with system administration activities and the sharing of know-how between dierent individuals. The documentation can be explicit when possible (as when describing a particular procedure), or implicit in the process (by version control of conguration les or management scripts for example).

9 | 34

ELI-BL-4400-REP-00000143-A

Part II

PROPOSALS

10 | 34

ELI-BL-4400-REP-00000143-A

5

Infrastructure

5.1 Open Source and Operating Systems Open source [5, 6, 7] is a pragmatic approach that brings an economical advantage, as one can benet from the developments of the open source community for building systems easily

upgradeable

and

scalable.

Therefore we aim to use open source as much as

possible, with an in-house development model, that will impact positively in the future maintenance chores, lowering the operational costs (like avoiding license fees, or software upgrades). According to that rationale, Linux is the chosen operating system (OS), and GNU is the chosen OS user-land, for the development of the Control System, as well as for the operation of the facility. There are various popular high quality GNU/Linux distributions, which are available for free:

Distribution

Package

Release cycle

Support period

manager

Arch Gentoo Slackware Linux From Scratch Debian Ubuntu OpenSuse Fedora CentOS/Scientic Linux

Number of Packages

pacman

rolling updates

not applicable

7,062

portage

rolling updates

not applicable

11,841

tarball

every year

5 years

1,249

based none

twice a year

none

not appl.

apt/dpkg

every 2 years

at least 3 years

43,237

apt/dpkg

every 2 years

5 years

45,509

yum/rpm

once a year

1.5 years

5,395

yum/rpm

twice a year

1 year

3,020

yum/rpm

once in 3 years

10 years

8,465

Table 1: A comparison of popular GNU/Linux distributions (data compiled from [8])

CentOS and Scientic Linux are created by recompiling the sources of Red Hat Enterprise Linux (RHEL). Scientic Linux is more of a niche project, for the developer community has less than 10 members. Fedora and OpenSuse are pilot distributions into which commercial distributions - RHEL and Suse Linux, respectively - introduce new features. Ubuntu is a Debian derivative - it uses the same package manager, inherits its extensive package database, and there is signicant cooperation between the Debian and Ubuntu developer teams. Ubuntu is backed by a company, Canonical Ltd., which makes it closer to RHEL, whereas Debian is a true open source (free software) endeavour developed by a worldwide community. The only distributions that satisfy the requirements of the software infrastructure tools, see section 3.3, are

·

Debian

·

Ubuntu

11 | 34

ELI-BL-4400-REP-00000143-A

·

CentOS.

There are two separate use cases with respect to the OS:

·

software development

·

control system deployment.

Each of the use cases has slightly dierent requirements, and therefore two dierent solutions shall be adopted to satisfy them. The most important criterion for software development is the availability of a large number of software packages. Deployment, on the other hand, inevitably demands stability over any other feature.

5.1.1 Software development The OS suitable for software development must allow the developers to test all available libraries, APIs, debuggers and compilers eciently - that is without being forced to search for the software on the internet, and compile and congure it. In other words, software availability is paramount. In this regard, there is no better choice than Debian GNU/Linux.

A number of prominent academic facilities such as the European Synchrotron Radiation Facility (France), the Institute of Physics (Czech Republic) and MetaCentrum VO (Czech Republic) have built their infrastructure upon Debian. Debian shall be the OS used for development. Some more points supporting the choice of Debian as the software development platform: 1.

It has ocial support from the Tango CS project.

2. It

strictly respects the File System Hierarchy Standard

(FHS) [9], which

denes the directory structure, directory contents and predened paths for the conguration les of applications in Unix and Unix-like operating systems. 3. It has

the best package manager with access to over 44900 packages (precompiled

software that is bundled up in a nice format for easy installation).

The package

manager also handles the dependencies when installing a new one. 4. It has a solid community of developers (a 2010 study points to 1410 developers from which 873 are active [10]). There is also a partnership program which helps to the development [11]. There are 356 Debian consultants -companies- listed in 49 countries worldwide [12]. 5. To the best of our knowledge, Debian is considered to be a distribution most suitable for software development as software development for the free software community is often carried out on this platform.

12 | 34

ELI-BL-4400-REP-00000143-A

5.1.2 Control system deployment Overall system stability and the support period are major concerns for the deployment of the control system.

In this respect, RHEL and CentOS excel above any other OS. The

requirement of high availability, see section 4, shall be addressed by exploiting virtualization technology. RHEL/CentOS is one of the today's best virtualization platforms. For that reason, the chosen virtualization platform (server OS) is CentOS, which is a binary compatible distribution created from the RHEL source code. It might be necessary to deploy RHEL instead of CentOS to keep guaranty conditions (occasionally, providers only accept RHEL in order to keep the guaranty in case of hardware failure).

Nevertheless, the necessity of running RHEL as a host OS does not imply that the virtualized OS must be RHEL as well. The Control System will provide support to other operating systems (

i.e.

running on virtual machines, when technical reasons justify its use (

i.e.

Windows)

the lack of drivers

for Linux).

5.1.3 Debian - CentOS compatibility All the programming languages that will be used were designed to have their source code portable, thus incompatibilities are mostly expected from using dierent versions of libraries.

The upcoming Debian release, version 8 (codename Jessie), and the recently

released CentOS 7 (released in Jul 2014) are expected to be largely compatible. Table 2 lists system libraries and utilities, which are deemed important for the Tango implementation of the CS with an image processing capability and GUIs programmed in Qt, Java or Python.

13 | 34

ELI-BL-4400-REP-00000143-A

software

Debian Jessie

CentOS 7

Linux kernel

3.16

3.10

GNU C library

2.19

2.17

GNU binutils (assembler)

2.24.90

2.23.52

GNU C++ compiler

4.9.1

4.8.2

OpenSSL

1.0.1j

1.0.1e

Xorg X server

1.16

1.15

Python

2.7.8

2.7.5

OpenJDK

1.7.0.71-2.5.3

1.7.0.71-2.5.3

Qt

4.8.6

4.8.5

OpenCV

2.4.9.1

2.4.5

Fast Fourier Transform (tw2)

2.1.5

2.1.5 (EPEL)

LAPACK

3.5.0

3.4.2

GNU scientic library

1.16

1.15

Hierarchical Data Format 5

1.8.13

1.8.12 (EPEL)

OpenMPI

1.6.5

1.6.4

Table 2: Version numbers of selected software found in Debian Jessie and CentOS 7 (EPEL packages not part of CentOS. They are provided by a 3rd party repository)

5.2 Operating System Virtualization The base infrastructure implementation must comply with the requirement of high availability and must also provide a unied environment, see section 4. To that eect, full OS virtualization will be employed - one virtualized OS instance per one service instance. The current state-of-the-art Linux kernel supports 1. KVM (Kernel-based Virtual Machine), which is part of the main kernel source tree [13], 2. Xen, which is available as a set of source code patches [14, 15]. Both KVM and Xen are mature and high-performing hypervisors (virtualization technologies) with their CPU performance closely approaching that of a non-virtualized OS [16, 17], see table 3 (a truncated version of the results table found in [16]). Application

Task

Unit

Non-virt.

KVM

XEN

POV-Ray

image rendering

seconds

230.02

232.44

235.89

Smallpt

image rendering

seconds

160

162

167.5

John the Ripper

password cracking

1/second

3026

2991.5

2856

creating signatures

1/second

397.68

393.95

388.25

sequence alignment

seconds

7.78

7.795

8.42

(Blowsh) OpenSSL (RSA signing) Timed MAFFT Alignment Table 3: CPU performance comparison of the KVM and Xen virtualization hypervisors

14 | 34

ELI-BL-4400-REP-00000143-A

KVM is a full virtualization hypervisor that requires (hardware) virtualization extensions of a CPU (either Intel VT-x for Intel processors, or AMD-V for AMD processors) while employing para-virtualized drivers (for network, disk, clock, memory, graphics card) enabling the guest OS to use devices on the host machine.

Xen is originally a para-

virtualization hypervisor, but its evolution led to an implementation of a full virtualization hypervisor utilizing hardware virtualization extensions as well as para-virtualized drivers.

KVM solution KVM shall be the adopted virtualization solution, as CentOS/RHEL, which is the chosen server OS, ocially supports only KVM. To ease virtual machine deployment, libvirt [18] and its client utilities shall be exploited.

Live migration OS virtualization makes it possible to run many instances of an OS (or, indeed, a dierent OS) in parallel as guests on a single host machine running CentOS/RHEL, see gure 1. The guest OS instances are not tied to the host OS or to the underlying hardware, and thus may, if necessary, be migrated to another server. Live migration imposes restrictions on the infrastructure: 1. virtual machine images be stored on a networked storage pool (NFS in our case, see section 5.6). 2. hosts must share the same type of CPU (migration from Intel−→Intel or AMD−→AMD).

Figure 1:

Layered description of the operating system virtualization technology:

Cen-

tOS(host)/Debian(guest).

5.3 Management tools for the base infrastructure Owing to the complexity of software, special-purpose tools are used to manage (administer) the operating system (GNU/Linux), services that run on it (wiki server, database server, web server) and the applications executed under the OS.

15 | 34

ELI-BL-4400-REP-00000143-A

Since an administrator rarely takes care of a single machine, it is necessary to deploy tools that allow for maximum automation, which are the tools capable of administering a group of computers. The cluster management tools seldom work alone and the lower level tools - package managers, utilities - are vital for accomplishing their preset task. Cluster managements tools have features such as centralization and scalability, and therefore are, in fact, a solution very well suited for eective administration of the computers in the base infrastructure.

5.3.1 Automated operating system installation The rst step in getting the base infrastructure operational is the installation of an operating system. The installation will be performed in an automated/unattended manner to satisfy the requirements of maximum consolidation, automation and scalability as referred in section 4.2.

The object of an automated OS installation procedure is to produce an

initial, well dened copy of the OS that will be installed on each of the machines in the CS Software group, which are used for development purposes.

Preseeding and Kickstart The unattended installation method will utilize the native OS installers (debian-installer from Debian project, anaconda from CentOS) and packaging managers. The installers are supplied with the answers to the questions asked during the installation process in advance before the actual installation begins. This is called preseeding the Debian installer [19], while the term Kickstart [20] had been coined for the similar functionality under CentOS. Preseeding and Kickstart are both the method of rst choice before any others, since both of them exploit the standard and supported procedures to install the OS.

xCAT In order to further facilitate and automate the installation procedure the power of the xCAT cluster administration toolkit [21] shall be leveraged to carry out local disk provisioning by preseeding or Kickstart.

5.3.2 Operating system maintenance, conguration and customization The requirement of consolidation and automation of software maintenance, section 4, implies that the solution must allow the administrator to 1. install and update software using the OS package manager 2. congure, start or stop services 3. create and remove user accounts 4. execute a command All of these actions must be carried out on a group of managed computers. The GNU/Linux system is assembled from many separately developed software parts, therefore the process of their conguration is by no means uniform. To overcome this diculty, there is a host

16 | 34

ELI-BL-4400-REP-00000143-A

of tting solutions. Based on the experience of the administrative team of IT4Innovations [22], we selected: Puppet [23, 24] and etckeeper [25].

Puppet is typically used to enforce conguration changes on a group of computers. Puppet treats a system's conguration as a collection of, so called, resources. A resource, which is an instance of a resource type, can be a user account, le, running service, software package, etc. Each of the resource types has a set of attributes, and each attribute has a value. This allows to overcome the non-uniformity mentioned in section 5.3.2.

Puppet uses its own

declarative language to describe the set of conguration changes to be performed [26].

etckeeper The etckeeper program is a utility that uses a version control system to store the changes made to the /etc directory. It accomplishes this by hooking in the OS package manager.

5.3.3 Tango control system API binaries The requirements of maximum consolidation and environment unication, see section 4, shall be addressed by serving Tango binaries via the Network FileSystem (NFS) protocol [27]. Developer's machines, clients, shall be congured to mount, access and enable the use of the served Tango binaries in CS development. This step will greatly simplify maintenance and will ensure that all the Tango servers run the same version of Tango. This approach will also make the updating procedure smooth, as it only has to be performed once on the NFS server, after that the updated binaries shall be available for all the client machines.

5.4 Monitoring tools for the base infrastructure A monitoring solution shall be deployed for the services and servers used 1. in software development 2. to support the functioning of the CS and the ELI facility. In order to satisfy the requirements concerning infrastructure monitoring, see section 4.2, the adopted solution must provide means to monitor 1. hardware: disk health, CPU, memory and network 2. software: OS services, running processes, the le system Again, there are various eligible open source monitoring systems such as Ganglia, Zabbix and Nagios.

17 | 34

ELI-BL-4400-REP-00000143-A

Nagios Nagios is the adopted solution on account of its excellent documentation and record of being for more than 10 years in active use and development.

Nagios is a solution for

servers and switches. Nagios is capable of monitoring both hardware and software. The rich functionality of Nagios includes an elaborate web interface, powerful script APIs, event handlers to allow automatic restart of failed applications and services, historical reports and sending alerts via email or SMS. The part of Nagios running on the monitored computers relies on plugins, and therefore it is very customizable.

5.5 User authentication In order to satisfy the requirements of access restriction and authentication, see section 4, the Kerberos authentication system (version 5) shall be introduced to facilitate administration of user accounts and to secure NFS services. The storage solution, see 5.6, supports Kerberos authentication of its NFS clients and even encryption of NFS packets. A Kerberos authentication server and a ticket-granting server are provided by the administrative team at the Institute of Physics.

Kerberization of our network shall provide us with a

standard, transparent solution, which could also be later implemented in the HPC cluster.

5.6 Hardware In agreement with the requirements described in section 3.4, a series of decisions regarding the hardware conguration for hosting the software development, monitoring and manage-

web-based ) will run on this

ment services are presented here. Some of the software services (

hardware congured in high-availability mode, therefore these services will be mirrored to increase the system's fault tolerance. The HW conguration described below is depicted in gure 2 (shown at the end of this section), where all components t in an 18 inches standard rack. It should be noticed that, except for some small accessories all the HW is already purchased (Tender no. S14/173E).

Servers

All the monitoring and administration tools herein described will be hosted on

these servers. For better maintainability and reliability, each of these tools will be running in a separate virtual machine. To ensure that enough hardware resources are available to each virtual machine, and considering future expansion of the number of virtual machines, it was decided that the host server be composed of two Xeon-type CPUs, each with 6 cores and a total amount of RAM equal 128 GB. The mirrored server has the same parameters.

CUDA/OpenCL Server

For the development and running of data processing algo-

rithms and parallel programing in general, a solution that is fully compatible with the DAQ and the CS architecture (RDMA, TANGO and C++) was chosen. A GPU server of the following specications was purchased: Two Xeon CPUs with 6 cores each, 256 GB of ECC REG RAM, Four Tesla K40M GPU coprocessors and data storage with at least 16 GB HDD of raw disk space.

18 | 34

ELI-BL-4400-REP-00000143-A

Network Storage

A storage solution is provided by a NAS server with multiple con-

nectivity ports (4 x RJ-45 1 GbE 4 with Link Aggregation / Failover support), high data throughput and an easy to use web interface. The total required amount of space available to users is 24 TB with RAID 10, reading speed at least 340 MB/s and writing speed at least 190 MB/s. A backup NAS unit with similar performance and disk storage size was also purchased.

The Support Hardware

To provide reliability, connectivity and to ease up maintenance

following support hardware was selected:

·

Power Backup.

·

Rack 19 with enough size to hold all the equipment (expectation is 15-18 U), this means depth at least 600mm.

·

RDMA network adapters with GPUDirect support (Mellanox Connex3 VPI) + necessary network elements to connect the GPU Server to another server/workstation.

·

Router or Switch with 16+ 1 GigE ports, POE, 1 or 2 SFP port, QoS and VLAN. (To be decided/discussed)

Network Connectivity

For software development a simple connection to the Standard

ELI Oce network will be sucient.

19 | 34

ELI-BL-4400-REP-00000143-A

Figure 2: List of HW components that will be in the rack, intended to support the software infrastructure.

20 | 34

ELI-BL-4400-REP-00000143-A

Requirement

Description

2 x Server Class Computer

Software tools hosting and mirroring

GPU Server

Simulation algorithm development/image

NAS Storage + backup

Storage of development data and the

19' Rack

Storage room for all the equipment

processing

Cluster results storage

Router/Switch

To get automatically local ip

Power Supply Backup

To overcome main power supply failure

2 x RoCE network adapter

GPUDirect compatible + accessories

The Cluster connectivity

Connection to the ELI Oce network

The Servers + NAS connectivity

Connection to the ELI Oce network

Table 4: Detailed hardware specications for the base infrastructure.

21 | 34

ELI-BL-4400-REP-00000143-A

6

Development Cycle

This chapter discloses parts of development cycle and shows how the chosen solutions meet software development requirements 3.1 and best practices of code management and development. In software engineering a software development cycle (also known as software development methodology) is a division of software development process into the following phases with intent of better planning and management.

·

Requirement gathering and analysis.

·

Design.

·

Implementation or coding.

·

Testing.

·

Deployment.

·

Maintenance.

Each phase produces deliverables required by the next phase in the development cycle. For example, the result of the testing phase is the verication that the deliverable of the implementation phase satises the requirements. This section is an overview of tools which help to follow this life cycle.

Being used

together, they form a well organized environment for development, testing and maintenance of the code. This helps to keep focus on task and not be districted with technical questions of collaboration and code sharing. An overview of the system is shown in the picture 3.

6.1 Management tools To handle the complexity of development processes imposed by 3.1 and to improve perfor-

Redmine was chosen as project management tool.

mance of code development,

Project management software is not just for managing software based project. It can be used for variety of other tasks too. The web-based software should provide tools for planning, organizing and managing resources to achieve project goals and objectives. An advantage of a web-based system is that it is not necessary to install any additional software on every development machine. The software can be easy to use with access control features (multi-user regime) which is another big advantage. Modern management software tools can be used for many tasks like issue-tracking, calendar, Gantt charts, email notication and much more. The open source project

Redmine [28, 29] has been selected in order to fulll software

requirement 2 of Subsection 3.1. Redmine is a exible project management web application, which is cross-platform and cross-database. It can handle multiple projects and unlimited number of subproject.

It has embedded graphical tools to aid visual representation of

projects and their deadlines. This solution was chosen over similar alternatives like Trac (which had a less streamlined interface) and Github and Gitlab (good quality but with prohibitive price).

22 | 34

ELI-BL-4400-REP-00000143-A

Figure 3: Continuous integration methodology scheme

6.2 Repositories and VCS According to agile methodology [30], code development process should be split into dierent tasks for better performance. Each task aims to perform a well dened action with the code base. The task is represented by a branch in version control system. The following description will disclose the tools that will be used at ELI for code development and code managing processes.

6.2.1 VCS comparison Two version control systems can be considered to satisfy requirement 1 from Subsection 3.1 and implement the described branching model - svn and git [31, 32]. These two systems are most commonly used and

de facto become a standard tools for repository management.

23 | 34

ELI-BL-4400-REP-00000143-A

Figure 4: Git repositories displayed in Redmine

The git and svn have widest community of developers, and both fulll the requirement from Subsection 2.1.2 of having GNU/Linux support. The advantages of git can be broadly enumerated: 1. Git is distributed:

every contributor has his own copy of the full repository and

therefore it's not necessary for him to be online with central server all the time. 2. Another advantage of a distributed system is that if the central repository for any reason collapsed it can be easily restored from developers' copies. This makes central repository extremely secure upon any disaster. 3. Git is much faster than SVN [33] and has excellent merging tool. It was created to support Linux kernel development where dozens of programmer commit continuously. So that's why git resolves any merging conict fast and eciently. Fig. 4 shows git branches review tool integrated into Redmine.

6.2.2 Branching scheme In agreement with best practices [34] three level branching system was chosen for code development process. This scheme is presented on Fig. 5 This section gives an overview of branching scheme.

24 | 34

ELI-BL-4400-REP-00000143-A

Figure 5: Branching scheme

dev

While working with the development branch each contributor has local copy of it and commits his changes to the dev branch.

Code review is done on this level.

Contributor is obligated to commit to dev branch not only his changes but also

tests

unit

[35], which are also veried by a reviewer. Continuous integration system is

also working on this level.

It runs all unit test which have been committed and

performs night builds for the whole branch. Once list of requirement or plan for current release is made then it's time for code freeze.

code freeze testing

A process of moving code from dev to testing branch.

On this level software is being prepared for release. Since entire code from dev

branch is moved to

testing branch integration tests, regression tests are running on

this level. When the testing cycle nishes and all tests succeed, the software it ready to be released.

release stable

A process of moving code from testing to stable branch. A branch which carries the most recent version of tested code. Since code is merged

from testing branch to stable it should be taken only from stable branch cause it stays unchanged until next release. This methodology can be applicable also for cases when it will be minor and major releases.

conicts resolving

Each contributor is responsible for resolving merge conicts while

delivering his change to dev branch.

There should be no conicts while merging

from dev testing branch since testing branch wasn't modied since last code freeze. The same is applicable for transferring code to stable branch.

hot x

It may happen that there will be a bug found during testing after code freeze.

Then special code delivery called

hot x is prepared in order to resolve issue.

25 | 34

Hot

ELI-BL-4400-REP-00000143-A

x is prepared in dev branch, tested with unit test and after that forwarded to testing. Since hot x is committed to testing branch testing cycle should be relaunched. This scheme is exible enough to be adjusted for any size of development group and perfectly meets general software development requirements.

6.3 Continuous integration approach Continuous integration is an essential part of the Agile methodology.

It was originally

intended to be used in combination with automated unit tests [36].

Initially this was

conceived of as running all unit tests in the developer's local environment and verifying they all passed before committing to the code repository. This helps avoid one developer's work in progress breaking another developer's copy.

Later elaborations of the concept

introduced build servers, which automatically run the unit tests periodically or even after every commit and report the results to the developers.

6.3.1 Unit tests Unlike for instance integration test which checks the correct inter-operation of multiple subsystems code unit test [35] is something smaller but not less important.

Unit test

species and checks one point of the contract of a single method of a class. Unit tests have the following requirements.

·

A very narrow and well

·

Complex

·

dened scope.

dependencies and interactions to the outside world are emulated.

Every piece of code

should be

covered

with an appropriate unit test (and it is

better to have more than one). The above means that the number of tests is huge and one should run them every time when code is changed to ensure that everything still works as expected.

And here it is

where continuous integration software becomes relevant.

6.3.2 Continuous integration software. A special continuous integration server is working upon git repository.

It performs two

essential topics of continuous integration approach: automatic build and automatic tests execution. According to software requirements 3.1 following tools to cover these topics are selected.

Automatic building and testing tool - Jenkins

To meet requirement 3 from Subsec-

tion 3.1, it is proposed to use the Jenkins tool. It is an application that regularly executes and monitors builds of software [37]. Builds can be started by various means, including being triggered by commit in a version control system, scheduling via a system mechanism, building when other builds have completed, and by requesting a specic build URL. In current approach it will be set up to run all the builds and tests after every commit to

26 | 34

ELI-BL-4400-REP-00000143-A

the dev branch. Such approach obviously facilitates the code integration and gives almost immediate feedback to the developers. Jenkins is a well-known, industrial quality continuous integration tool with the largest user base at present. It has been tested and works well with the other tools mentioned in this section.

Reporting tool - CDash

To meet requirement 4 from Subsection 3.1, CDash was se-

lected [38]. It is an open source, web-based reporting tool (dashboard). CDash aggregates, analyzes and displays the results of software testing processes submitted from contributors. You can see an example of a CDash report on Fig. 6. Integration of CDash into the work ow involving the tools proposed in this document has already been deployed at ELI with satisfactory results.

Figure 6: Screenshots of CDash with test results

27 | 34

ELI-BL-4400-REP-00000143-A

7

Information tools

7.1 Database for ocial documents For document management, the open source version of Alfresco [39] will be used. Alfresco was chosen based on a survey of existing alternatives [40]. This document management system will be used for ocial documents written by the members of the Systems engineering team.

Alfresco stores les in a directory structure.

In addition, the documents

are equipped with metadata such as the information about the author, tags (a predened list of document types), document title and description. The users can perform a full-text search on the database, search based on the metadata values, or a combination of both.

7.2 Internal informations Requirements from Section 4.3 will be satised by using the open source wiki-based engine MediaWiki [41]. It is the engine behind Wikipedia and many other wikis. The wide base of users guarantees vast amount of documentation and available extensions. Two instances of mediawiki will be deployed. 1. The rst one is available to all the members of the team. It will contain for example the list of the software that will be used and links to the documentation of the software. In particular, the tools listed in this document will be included. 2. The second one is available only to the team members responsible for the administration of the hardware and software of the team. This wiki contains details of the conguration of the tools and links to the documentation of the software in use.

7.3 Source Code To satisfy requirement 5 from Subsection 3.1, the following source code documentation approaches will be followed.

7.3.1 Doxygen Doxygen [42] is a tool for writing software reference documentation.

It generates the

documentation from source code comments  just using some specic markups before the comments. Among other useful features, it automatically creates inheritance diagrams to visualize the class-relations.

The doxygen-generated documentation will be available in

Redmine.

7.3.2 Documenting Tango servers using Pogo Documentation of the attributes and commands of every Tango Device server is done using Pogo. Pogo is a tool with a graphical interface that simplies creation of Tango Device servers. The provided informations describe the purpose and functioning of the attribute or command and are available in multiple places: 1. During the operation, the operator will see this information in standard Tango tools such as Device panel and ATKPanel. See Fig. 7.

28 | 34

ELI-BL-4400-REP-00000143-A

2. The documentation of the Device server generated by Pogo.

Figure 7: Command documentation from Pogo displayed in Device Panel.

29 | 34

ELI-BL-4400-REP-00000143-A

8

Full integration overview

8.1 Full integration overview Figure 8 depicts all distinguishable layers for the maintenance and monitoring of the afore described chores required by the base infrastructure.

It has three logically split layers

in regards to the software, that covers from maintenance to operation (that is, software development). One extra layer for the metal, composed of development workstations and servers providing network services.

Figure 8:

Overview of the tools required by the base infrastructure through a layered

description.

30 | 34

ELI-BL-4400-REP-00000143-A

9

Acronyms

API

Application Programming Interface

BMS

Building Management System

CCS

Central Control System

CS

Control System

ELI

Extreme Light Infrastructure

EPICS

Experimental Physics and Industrial Control System

GUI

Graphical User Interface

KVM

Kernel-based Virtual Machine

HDF

Hierarchical Data Format

HVAC

Heating, Ventilation and Air Conditioning

OS

Operational System

RDBMS

Relational DataBase Management System

RHEL

Red Hat Enterprise Linux

SCADA

Supervisory Control And Data Acquisition

SoW

Statement of Work

SQL

Structured Query Language

WP

Work Package

VCS

Version Control System

31 | 34

ELI-BL-4400-REP-00000143-A

References [1]  Systems Engineering Statement of Work WP4.4 Control System, ELI-BL-4500SOW-00000076-A, 31/1/2014, TeamCenter #00086960. [2] Tango/EPICS report,

ELI-BL-4400-REP-00000067-A, 17/12/2013,

TeamCenter

#00085361. [3] Test report of the tango2epics gateway, ELI-BEAMS, Prague, Czech Republic, Tech. Rep. 00000106, February 2014. [4] Archiving a set of epics variables in tango, ELI-BEAMS, Prague, Czech Republic, Tech. Rep. 00000108, February 2014. [5] [Online]. Available: http://opensource.org/ [6] [Online].

Available:

itmanagement.earthweb.com/osrc/article.php/3717476/

Interview-with-Richard-Stallman-Four-Essential-Freedoms.htm [7] [Online]. Available: http://www.catb.org/~esr/ [8] Distrowatch: Put the fun back into computing. use linux, bsd. WebPage. [Online]. Available: http://distrowatch.com/ [9] Linux fsh, http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard. [10] Debian

developers,

http://www.perrier.eu.org/weblog/2010/08/07#

devel-countries-2010. [11] Debian partners, http://www.debian.org/partners/. [12] Debian consultants, http://www.debian.org/consultants/. [13] Kernel

based

virtual

machine

main

page,

WebPage.

[Online].

Available:

http://www.linux-kvm.org/page/Main_Page [14] Xen project. [Online]. Available: http://wiki.xen.org/wiki/Xen_Project_Software_ Overview [15] Xen documentation. [Online]. Available: http://www-archive.xenproject.org/les/ Marketing/HowDoesXenWork.pdf [16] Performance benchmarks:

Kvm vs. xen,

WebPage. [Online]. Available:

https:

//major.io/2014/06/22/performance-benchmarks-kvm-vs-xen/ [17] Baremetal

vs.

xen

vs.

kvm

-

redux,

WebPage.

[Online].

Available:

https:

//blog.xenproject.org/2011/11/29/baremetal-vs-xen-vs-kvm-redux/ [18] libvirt: The virtualization api, WebPage. [Online]. Available: http://libvirt.org/ [19] Debian installation guide, WebPage. [Online]. Available: https://www.debian.org/ releases/jessie/installmanual

32 | 34

ELI-BL-4400-REP-00000143-A

[20] Red

hat

Available:

enterprise

linux

7

installation

guide,

WebPage.

[Online].

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_

Linux/7/html/Installation_Guide/index.html [21] xcat main page, WebPage. [Online]. Available: http://sourceforge.net/p/xcat/wiki/ Main_Page [22] It4innovations main page, WebPage. [Online]. Available: http://www.it4i.cz [23] Open

source

puppet,

WebPage.

[Online].

Available:

https://puppetlabs.com/

puppet/puppet-open-source [24] Puppet labs: It automation software for system administrators, WebPage. [Online]. Available: https://puppetlabs.com [25] etckeeper main page, WebPage. [Online]. Available:

http://etckeeper.branchable.

com/ [26] Learning

puppet

-

resources

and

the

ral,

WebPage.

[Online].

Available:

https://docs.puppetlabs.com/learning/ral.html [27] Linux nfs project. [Online]. Available:

http://wiki.linux-nfs.org/wiki/index.php/

Main_Page [28] Redmine project. [Online]. Available: http://www.redmine.org/ [29] Redmine wiki. [Online]. Available: http://en.wikipedia.org/wiki/Redmine [30] Agile home. [Online]. Available: http://agile.vtt./publications.html [31] Svn project. [Online]. Available: http://tortoisesvn.net/ [32] Git project. [Online]. Available: http://git-scm.com/ [33] Git to svn comparison. [Online]. Available: https://git.wiki.kernel.org/index.php/ GitSvnComparison [34] Microsoft branching model. [Online]. Available: http://msdn.microsoft.com/en-us/ library/ee782536.aspx [35] [Online].

Available:

http://en.wikibooks.org/wiki/Introduction_to_Software_

Engineering/Testing/Unit_Tests [36] Continuous integration wikipedia entry. [Online]. Available:

http://en.wikipedia.

org/wiki/Continuous_integration [37] Jenkins project homepage. [Online]. Available: http://www.jenkins-ci.org/ [38] Cdash project homepage. [Online]. Available: http://www.cdash.org/ [39] Alfresco project homepage. [Online]. Available: https://www.alfresco.com/

33 | 34

ELI-BL-4400-REP-00000143-A

[40] Wiki

and

document

storage

system

for

SE,

ELI-BL-4000-REP-00000099-A,

18/3/2014. [41] Media

wiki

project.

[Online].

Available:

https://www.mediawiki.org/wiki/

MediaWiki [42] Doxygen project. [Online]. Available: http://www.stack.nl/~dimitri/doxygen/

34 | 34