SENSOR INFORMATICS

E.H. Dooijes F.C.A. Groen

Department of Computer Science University of Amsterdam

Sixth Edition, January 1999 / October 2006

Table of contents

1

Introduction

3

2

Sensor properties

8

3

Continuous-time signals and systems

35

4

Discrete-time signals

50

1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8

Aims and scope Unstructured versus structured data Relation between the real world and the world model Data exploration and model building Applications of sensor informatics Conversion of physical quantities into computer readable form Sensor properties Uncertainty in the sensor value Sensor types Image sensors Mechanical signals Standards for measurement systems Virtual sensors References Least-squares approximation of a function Orthogonal functions Fourier series expansion of a function Energy and power; power spectrum Example: an impulse function The Fourier transformation The uncertainty principle Some properties of the Fourier transform Linear time invariant systems Example: low-pass filter Low-pass filters in general High-pass and band-pass filters Deconvolution Non-linear systems An introduction to stochastic signals Power spectrum References and further reading Discrete-time signals Discrete Fourier transform (DFT) Discrete-time Fourier transform (DTFT) Sampling continuous-time signals; Nyquist's theorem Reconstruction of continuous-time signals Discrete LTI systems Stochastic discrete-time signals References

1

5

Information and communication

60

6

Compression of sensor data

71

7

Structuring by modeling

84

8

Structuring by interpretation

96

9

Control systems

107

5.1 5.2 5.3 5.4 5.5 5.6 5.7 6.1 6.2 6.3 6.4 6.5 6.6 7.1 7.2 7.3 7.4 7.5 8.1 8.2 8.3 8.4 8.5 8.6 9.1 9.2 9.3 9.4 9.5

Entropy The capacity of a transmission channel Error-detecting and error-correcting codes Transmission of analog signals: amplitude modulation Transmission of analog signals: frequency modulation Transmission of digital signals References Entropy encoding LZW coding JPEG image coding Coding of image sequences Run encoding References Modeling of sensor data Estimation of model parameters Stochastic models Modeling of speech References Feature space Probability density function of the classes Bayes' rule Nearest neighbour method Evaluation of classification results Relation between number of learning samples and features Introduction A robot controller The z-transformation Poles and zeros A discrete-time feedback system

Appendix A: An introduction to complex numbers

113

Appendix B: Some basic statistics

116

Appendix C: Foutenvoortplanting (in Dutch)

121

In the 6th edition of this Sensor Informatics syllabus a number of errors (mainly typos) from the previous edition are rectified. Chapter 9 is new. EHD - December 1998. In the current pdf version the pages were renumbered, and two or three minor errors were corrected. EHD, October 2006.

2

1. Introduction

Sensor informatics deals with input and output of information systems from and to the real world. Traditionally it is the human observer who is the interface between the real world and the information system. Information systems that sense the outside world without the direct intervention of a human observer will become increasingly important. The techniques applied here can also be used when the sensor signals do not originate from physical quantities, but are stock-market notations or dollar exchange rates. This kind of data can also be be seen as valid sensor data. The present course will survey this field and provides an introduction to courses in computer vision, digital signal processing, coding, and pattern recognition. We will give a survey of the different aspects of sensor informatics and focus on the coherence between the different fields involved. 1.1 Aims and scope In many traditional applications, input to an information system is given by striking keys on a keyboard. This input is basically textual information. Interaction devices such as a mouse gives the possibility to supply graphical data to the information system, and also to interact with pictorial information. As such, this kind of interaction gives a new dimension to working with an information system on pictorial data; which is essential for e.g. CAD/CAM systems. Using an interaction device to choose from a menu or select items from dialogue boxes makes working with an information system more easy, but does not create essentially new possibilities. Input obtained from sensors will play an increasingly important role in information systems of the future. In multi-media information systems, for instance, sound and pictorial information are typical examples of data obtained by sensing devices (microphones, video cameras), which is virtually impossible to enter through the keyboard or an interaction device. Applications in which the sensor information is used to interact or control the environment are traditionally studied in system theory and control theory. In many technical information systems the acquisition and processing of sensor information is needed to control a process. A simple example is a central heating controller, which switches on the heater when the temperature is below the desired level, and switches it off when the temperature is too high. In modern computerized heater controllers the measured temperature is compared to the heating

3

profile and this heating profile is adjusted for the next day to minimize energy consumption. We are almost not aware of many other small digital control systems, like those used in modern tv sets to control the tuner-frequency and the optimal adjustment of the set. Complex examples are the automatic pilot in airplanes, process control in the chemical industry, and many more. In general, the field of control theory is directed to the automatic control of processes in which real-time processing is essential. Telecommunication and coding is another important field with which we will be involved. More than often (in particular, in image handling) one has to deal with large quantities of data. Data has to be transported, stored, and retrieved efficiently. This requires coding of the data so as to remove any redundancy. In many cases, one can go even further because a certain loss of information can be tolerated in the coding process. Very efficient coding can be obtained by interpretation of the sensor data. For example sending textual information as bit patterns by fax costs much more bytes than ASCII coding the characters (one byte per character); even if information about font and position are also transmitted in the latter case. Text recognition (often called OCR, optical character recognition), a special application of image recognition, is needed to achieve this. In particular in telecommunication, coding is used on the contrary to add redundancy to the data in order to detect or compensate for information losses during transport or storage. 1.2 Unstructured versus structured data Sensor information always gives only a limited view on the environment. Only that part of the physical reality is perceived for which sensors are connected to the system. When for instance a microphone and a video camera are attached to the system, sound signals and images are obtained; but further, the system is blind for other possible information from the environment. Compare this to the richness of human sensing. Beside hearing and vision, even the human skin is capable of 'measuring' a number of quantities. The most obvious of them is pressure; related quantities are sheer and slip. Temperature and hence heat conduction and humidity are felt. Air flow is perceived through motion of hairs and cooling by forced evaporation. For all these quantities perception stops after some time: the skin is only sensitive to changes. Hence, for corresponding sensors dynamic properties must also be specified. Thus one has permanent, transient and periodic versions of a quantity. Secondly, sensor information is inherently unstructured. We may record images on a photo-CD and play them back in an arbitrary order, but we cannot recall them automatically based upon their contents: the images are known to the system by their record number not by their contents. Automatic retrieval based on the image data contents requires image data interpretation. Data structuring can be done by the computer if it is able to interpret the data; otherwise the structuring has to be done by a human observer.

4

Automatic recognition of objects present in images, recognition of speech, speakers or music in signals is a well-known hard problem which at the current state of art can only be solved within a limited context. The reason why it is difficult has to do with the large variability possible in the appearance of, for instance, a 3D object. Depending upon the direction of the illumination, the viewing angle, and how the object is placed in a 3D world, completely different images may be the result. A human is extremely good in this recognition task. Interpretation of an arbitrary video scene, which bears only the 2D projection information is no problem at all for a human being. This may be interpreted as a proof of existence that the information present in a video signal is sufficient for automatic understanding. However, we do not know how a human observer is capable of doing it. So, for us dealing with sensor data processing in information systems, automatic interpretation of an arbitrary scene is impossible for the time being. Only for very specific problems the situation is different, for example: interpretation of electronics diagrams, character recognition of machine written text (OCR), and speech recognition with a limited vocabulary and a limited number of speakers. Often human interaction is still needed to verify the result, or to resolve detected errors. So the major problems (and research items) are not in data acquisition but in the interpretation of the sensor data. 1.3 Relation between the real world and the world model As discussed in the previous section, sensor data is inherently unstructured. Automatic interpretation is possible only if we have some general knowledge about the sensor data. This knowledge can be represented by a model, which defines the context and what kind of information has to be extracted from the sensor data. When dealing with pages of written text, the appropriate model will describe the text in terms of fonts, sizes, and lay-out; and only text will be recognized. When figures or photographs are present in the text, these will be discarded since they do not fit in the model. This world model is an abstraction from the real world, but sufficient for the purpose which the system was developed for. Another example is a geographical information system. Existing maps are updated based upon remote sensing images. The world model consists in such a case of the positions of roads, waterways, railway tracks, and the borders between the different regions. No other information present in the remote sensing images can be brought in because the world model cannot accomodate it. For the guidance of a mobile system to avoid collisions, a map of the surroundings is needed to demarcate forbidden regions where obstacles are present. Often, many considerably different representations are possible for the same world model. Which one is most suitable depends upon the operations which have to take place upon the data. Clever choices here can make dramatic improvements in the performance of a system.

5

1.4 Data exploration and model building Often the world model is not known a priori, but has to be discovered from patterns and trends present in the sensor data. Besides statistical analysis and the like, visualisation is an important tool for exploring data. Visualisation should appeal to the human observer's unsurpassed ability to recognize patterns in the images he perceives in his daily life. Visualisation of signals, twodimensional images, and projections of three-dimensional data is more or less straightforward. Interaction devices give the possibility to interact with the visualised data, to change grey-tones, colour palettes or the viewing angle. Animation gives the possibility to emphasize changes with time. The situation becomes much more difficult when we have to deal with data that has an inherent higher dimensionality than three. Such a situation occurs, for example, when we measure more than three features of objects: each object is represented by a point in a highdimensional feature space. A simple illustration is the measurement of length and height of vehicles. When we plot along the horizontal axis the length and along the vertical axis the height, each vehicle is represented by one point in this 2 dimensional space. The points in the space will cluster for cars and trucks. Different techniques exist to explore these higher dimensional spaces, like projecting the points to a subspace. 1.5 Applications of sensor informatics Applications of sensor informatics can be divided into three mainstreams: • storage, retrieval and transportation of sensor data; • decision support based upon sensor data; • interaction with the environment based upon sensor data. The possibilities of modern computer equipment with regard to processing speed and memory capacity have made it feasible to manage large sensor-databases. A major application is the storage and retrieval of images or signals. Examples are geographic information systems, medical image databases, hospital information systems, museum databases, and document image databases. Often, besides storage and retrieval, processing of the sensor data is needed. Processing is needed, for instance, to enhance images, to compare images with existing abstract descriptions of the image content such as maps, or to align images with each other (registration). A field of large interest is indexing images based upon pictorial information. Most applications require specific techniques for measuring certain features of an image: in cardiologic images, measurement of the size or distribution of blood-vessel structures is important; in forensic databases, images are classified according to certain features of fingerprints. A second family of applications is decision support systems using sensor data. Pattern recognition and feature extraction techniques are important here. Well-known applications are optical character recognition and the automatic reading of postal checks. Also, systems for 6

medical diagnosis and surveillance are examples in this category. An area of increasing importance is verification for fraud prevention and admittance. Issues in this context are speaker and writer verification. Industrial examples are the detection of exceptional states of machines, chemical processes or nuclear reactors. In the third category of applications, the interpreted sensor data is used to control the environment by means of actuators. Examples are mail sorting systems based upon the postal code, mobile vehicles avoiding obstacles, or active robot vision systems focussing on moving targets. When the result of the action is perceived in turn by the sensing system, a closed loop is created. Such a loop can be used to control and stabilize a process, but closed-loop systems can become unstable. Instability is related to delays and amplification factors within the loop, so it is important to investigate how the feedback in the control loop is realized. Sometimes a feedback loop is created unintentionally. An example is the buying and selling behaviour of investors using the same investment advising program.

7

2. Sensor properties

To enable an information system to interact with the real world, we need to connect measuring instruments which observe the world and feed data into the system. In this chapter we will discuss various properties of measuring instruments and the way they can be put together. As sensing involves physics, this chapter will have a certain physical flavour. 2.1 Conversion of physical quantities into computer readable form A basic block diagram of a measuring instrument connected to an information system is sketched in figure 2-1. The first component is a sensor to convert the physical quantity we are interested in into an electrical signal. For instance, for sound we need a microphone to convert variations in air pressure into an electrical signal. For images we may use a video camera to obtain a signal representing the brightness in the image when it is scanned line by line.

physical quantity

sensor

Analog-to Digital Convertor

format

to bus

format

from bus

memory

sampling generation

Figure 2-1 Basic model of a measuring instrument

The next block represents the conversion of the electrical signal into digital numbers. This is realized by an Analog-to-Digital Convertor (ADC). The input range of the ADC is divided into a fairly large number of intervals of equal size ∆v. The successive intervals are numbered to represent the quantized input. Thus, when the quantized signal has the integer value k, the corresponding value of the original signal was in the interval between vk and vk+1: vk ≤ v < vk+1 with vk = k.∆v.

8

This process is illustrated in figure 2-2 for 8 quantization intervals. The number of quantization levels is in general a power of 2. When we have n bits available the number of quantization levels is 2n. For example, when the number of bits n = 8 there are 256 intervals, and the resolution is said to be 1/256 (of full scale)1.

v(t)

∆v

7 6 5 4 3 2 1 0 1

2

3

4

5

6

t

Figure 2-2 Quantization process of a 3 bit ADC with 8 quantization levels. The successive quantized values of v for t =1 through 6 are: 1,3,5,6,5,4.

An important decision to be made is the number of quantization levels (so the number of bits) needed to represent the continuous signal. This number should be related to the noise (inaccuracy) present in the sensor signal: the inaccuracy introduced by the quantization process should be considerably smaller than the inaccuracy in the sensor signal itself. We will discuss this topic in section 2.3. Another important issue is the question how frequently we should sample the continuous signal, as we can store only discrete events in the computer. In chapter 4 the Nyquist sampling theorem will be discussed; we will see that the sampling rate should be at least twice the maximum frequency present in the signal. In that case, the analog signal can be completely recovered from the sampled values. In fig. 2-1, the 'sampling generation' block takes care of the sampling. The sampling process can be quite complicated in video systems, where it has to be synchronized with the line-by-line scanning of the camera (section 2.5). The quantized signal is formatted and sent over a bus system to the computer. Sometimes blocks of data are stored temporarily in the memory of the measuring instrument. Relatively simple measuring systems can be put together easily using off-the-shelf equipment. The selection of the sensor depends strongly upon the application. There is a wide variety of different sensors for all kinds of physical quantities. Programmable equipment to sample an electrical signal and read it into the computer is available from many vendors. Such equipment can be directly interfaced to the computer system through a standard bus. 1In

this syllabus we adhere to the convention that the better or higher resolution is expressed by a smaller quantity. Thus 'resolution' can be associated with the smallest detectable difference in the quantity being measured.

9

Top to bottom: schematic representations of a digital-analog converter (DAC), an analog-digital converter (ADC) based on the principle of successive approximation, and a 'flash' ADC.

10

2.2 Sensor properties Let x be a physical quantity that we want to measure, and y the voltage output of the sensor. We would like to have a linear relation between the output of the sensor y and x: y = S. (x - x0 ) (2.1) The following properties of a sensor can now be defined. Sensitivity: The factor S is called the sensitivity of the sensor. An equivalent definition is S = dy / dx (2.2) Zero-point: x0 is called the zero-point. It is the value of the sensor input for which the output y is zero. For example, the zero-point of a temperature sensor is the temperature for which the output voltage is zero. Offset: this is the output value of the sensor when the input is zero. So for example the offset of a light sensor is the output voltage when there is no light. For a linear sensor, the offset is obviously related to the zero-point (substitute x = 0 in 2.1): yoffset = - S . x0 (2.3)

y

dy xmin xo

dx

offset

xmax

x

Figure 2.3 Illustration of sensor properties: sensitivity S = dy / dx, zero-point and offset

Another sensor property is its measurement range, given by the minimum and maximum value of the physical quantity (xmin and xmax) for which there is a meaningful output of the sensor. Accuracy: the accuracy is the uncertainty ∆x in the sensor value. Often it is given as a fraction of the measurement range : ∆x / (xmax - xmin). When the accuracy ∆x increases with the value of x, it is customary to specify it as a relative accuracy : ∆x / x (in this situation, we usually have xmin = 0). Non-linearity: the maximum deviation from the linear relation (2.1) is the non-linearity, mostly expressed as a fraction of the measurement range.

11

We have mentioned the desirability of a linear relationship between the physical quantity and the sensor output. This was certainly true in the past when there were no computerized measuring instruments. Today, however, a linear relation makes life easy but is not strictly needed. As long as there is a monotonic functional relation f(x) between y and x over the measurement range, we can use the sensor. This relation must be known and the inverse x = f-1(y) must be stored in a table in the measuring instrument or in the computing system. For each measured output y we can look up what the corresponding value x would have been. Evidently, in the case of a non-linear relationship the sensitivity varies over the measurement range. Input range: this is the range of input values for which the sensor operates according to the specifications. For input values beyond this range linearity may no longer be guaranteed, for instance. Input limit(s): If the input value exceeds these limits the sensor will probably be damaged. 2.3 Uncertainty in the sensor value When we measure repeatedly the output of a sensor (transducer) under the same conditions, the result will never be exactly the same. Small variations are present in the sensor signal called noise. Noise finds its origin in the physical properties of the sensor (thermal fluctuations and quantum effects), or results from external disturbances. The existence of noise is responsible for the fact that there is a basic and unavoidable uncertainty in the result of any measurement. Let us consider a sensor with linear response, i.e. a relationship y = Sx exists between the output value y and the input value x (whether or not there is an offset is not relevant in the present discussion). Now, if the sensor adds noise with standard deviation σonoise to its output signal , then it seems as if we are looking with a noise-free sensor at an input signal containing noise with standard deviation σinoise = σonoise / S (see Appendix B). Let xmin and xmax be the limits of the sensor's input range. The sensor can only discriminate between input signal values differing by an amount of order ∆x = σinoise. The sensor's dynamic range R, expressed in decibels, is the logarithm of the number of virtual accuracy steps ∆x corresponding to the input range xmax -xmin: xmax - xmin R = 20 10log dB with ∆x = σinoise. (2.4) ∆x A similar quantity can be defined for a signal s containing noise. If we compare the standard deviation σnoise of the noise in the signal to the signal range smax -smin, we obtain the signal-tonoise ratio (S/N) as S/N = 20 10log

smax -smin σnoise

dB.

Consider now the output signal of our noisy sensor while sensing a noise-free signal. It seems as if we are looking with a noise-free sensor to an input signal with effective signal-to-noise ratio 12

smax -smin dB. σinoise Evidently, this signal-to-noise ratio is maximal when smax -smin is as large as possible, i.e. when the signal's amplitude matches the sensor's input range xmax -xmin. In this case, the signal-toS/N = 20 10log

noise ratio of the sensor's output signal y equals S/N = 20 10log

xmax -xmin σinoise

= 20 10log

ymax -ymin σonoise

dB.

In order to translate the sensor's output voltage y into a sequence of digital numbers, suitable for being processed in a computer, an analog-to-digital converter (ADC) is used (section 2.1) which on fixed time intervals produces a number representing the quantization level which best approximates the current y value. An ADC introduces so-called quantization noise whose effective standard deviation σqnoise is (Appendix B): Δ σqnoise = √12

with

Δ=

ymax - ymin 2n

(2.5)

Here Δ is the discretization interval; n is the number of bits used by the ADC; ymin, ymax delimit the ADC's input range, which is taken here to coincide with the sensor's output range2. In table 2.1 S/N values for n-bits A/D converters are given. 2n

ymax -ymin σqnoise

S/N ratio in dB

2 4 8 16 32 64 128 256

7 14 28 55 111 222 443 887

17 23 29 35 41 47 53 59

Table 2.1. S/N ratio for quantization noise

Evidently we want the quantization noise to be at most of the same order of magnitude as the sensor output noise. Thus, to estimate the appropriate number of bits for the A/D converter, we put σqnoise < σonoise.

2This

is the ideal situation. In practice it can often only be attained by adapting the sensor to the ADC using an instrumentation amplifier, which also will contribute to the overall noise level of the sensor signal.

13

Example: A video camera, a brightness (b) to voltage (v) transducer, has according to the manufacturer's specifications a 50 dB SNR under good conditions. Hence 50 = 20 10log

vmax - vmin σonoise ,

vmax or σonoise = 316

because vmin = 0 in this case (zero offset). For digitizing the video signal, we choose the effective discretization step size such that Δ vmax 2 √12 < σonoise, hence Δ < √12.σonoise = 91 , and n > log91 ≥ 7 bits. By choosing 8 bits, a common practice in commercial video digitizers, the quantization noise is a factor two smaller than the noise in the video signal. Notice that we have assumed throughout the calculation that all signals are properly matched: thus - possibly by using an automatic iris - we fit the CCD camera to the given lighting conditions, and by using an amplifier with properly chosen amplification factor we take care that the sensor's output voltage range matches the ADC's input range.

2.4 Sensor types

Radiant signals Mechanical signals Thermal signals Electrical signals Magnetic signals Chemical signals

light intensity, wavelength, polarization, phase, reflectance, transmittance position, distance, velocity, acceleration, force, torque sound pressure temperature, specific heat, heat flow voltage, current, charge, resistance, inductance, capacitance, dielectric constant, electric polarization, frequency, pulse duration field intensity, flux density, moment, magnetization, permeability composition, concentration, reaction rate, toxicity, pH

Table 2.2 Physical properties in the signal domains after Middelhoek et al.[2.2]

There are sensors for measuring all kinds of physical entities. There are sensors for distance, small displacements, temperature, forces and torques, velocity, acceleration, flow and all kinds of radiation. Obviously, which sensors are used depends strongly upon the application. For a multi-media information system we need the possibility to input sound and images, so a microphone and a scanner or a video camera are needed. For a system to monitor air pollution and smog formation, the measurement of particles and certain gases in the air together with the weather conditions are important.

14

A sensor has to transform a signal from the outside world into an electrical signal. Following Lion [2.1] six different domains can be distinguished in these signals from the outside world: radiant signals, mechanical signals, thermal signals, electrical signals, magnetic signals and chemical signals. Physical properties of importance in these different signal domains are listed in table 2.2. For any sensor, the conversion from one signal domain to another is based on one of the many existing physical and chemical effects and measurement principles that have been developed. Because of the immense number of measuring principles and devices, reviews of this field often have an encyclopedic character. Signals are carried by some form of energy. Sensors that transform this incoming energy into the electrical energy of the sensor output are called self-generating or active sensors. No additional source of energy is needed to obtain the measured sensor signal. Examples are a solar cell, converting light energy into an electrical signal for measuring illumin-ation, or a piezo-electric microphone converting the mechanical energy of acoustical waves into an electrical signal. When an additional energy source is needed for the operation of the sensor, we call the sensor a modulating or passive sensor. The energy source is modulated by the measured quantity. Examples are: a linear potentiometer (variable resistor) used for measuring translations; an angular position decoder, which counts the number of holes in a rotating disc by interrupting a light beam; a Hall-effect sensor measuring a magnetic field, in which case a current source is modulated by the magnetic field. A sensor has in general a spatial resolution: it measures a certain physical quantity at a certain location. When we measure the temperature, we do so at (or around) the place where the sensor is. As to their spatial extension, point, line and area sensors can be distinguished, which produce a single value, a profile or an image of the measured quantity. It is usually important that sensors be robust, small and low-cost. With the development of low-cost microelectronics devices, new sensors can open new markets. Examples are sensors for the detection of the quality of food, for the consumption of gas, electricity; for the continuous inspection of correct operation of all kinds of system, such as street illumination; for the identification of persons and goods. In sensor technology, the material and the measurement principle used play an important role. Particularly the development of solid-state sensors based upon silicon are promising (Middelhoek et al.[2.2]). The use of silicon not only makes it possible to apply the welldeveloped production methods of integrated circuits to sensor production, but also makes it feasible to combine the sensing and the processing of the sensor signal on a single chip. This gives the possibility to improve the characteristics of a sensor at a much lower price and with better performance than which discrete components. Sensors combining sensing and processing are often called 'smart sensors' . Conversion of an electrical signal into one of the other signal forms from table 2.2 takes place at the actuator or output-transducer side. Examples are a display tube, in which a 15

conversion to radiant energy takes place, or a loudspeaker which transforms the electrical signal to mechanical energy of acoustic waves. Output transducers have undergone continuous development as well. Here too is a demand for robust and low-cost devices, as actuators represent the major cost factor in most systems. Sensor principles We will now briefly review the different principles to create sensors for the five non-electrical signal domains, and discuss then in more depth three important cases: from the radiant domain: the image sensors and from the mechanical domain : the measurement of position and sound. Radiant signals Electromagnetic radiation includes besides the visible (infrared and ultra violet) light also radio waves (including 'microwaves'), X-rays and gamma rays. They differ in wavelength, ranging from 104 m for long radio waves to 10-14 m for gamma rays. The wavelength of visible light is between 400 nm (violet) and 700 nm (red). A different form of radiation is nuclear-particle radiation, which includes alpha, beta, and other particles. In this text we will concentrate on visible light. Solid-state sensors for (visible) light are mainly based on the photoelectric effect, that converts light particles (photons) into electrical charge. The absorption of photons by the lattice of the sensor material (mostly silicon) creates electron-hole pairs, which upon collection realize the transformation of radiant energy into electrical energy. Examples of these sensors are photoconductors, photodiodes and phototransistors. Mechanical signals There is an important difference between sensors that measure position with or without mechanical contact with the real world. The measurement of positions in images using image processing techniques has given the possibility to measure positions remotely. Various physical principles are exploited for measuring position or proximity including inductive, capacitive, resistive and optical techniques. Force and pressure cannot be measured directly. First a force or pressure has to be converted to a displacement, and the displacement can be measured with one of the techniques described above. Thermal signals The resistance of a metal or a semi-conductor depends upon temperature. This relation is wellknown and is exploited for temperature sensing. Also the base-emitter voltage of a bipolar transistor is temperature dependent, and is used in many commercially available low-cost temperature sensors.

16

Self-generating temperature sensors can be obtained using the Seebeck-effect. When two wires made from different metals are welded together at one point , and this junction point is heated or cooled with respect to the remaining parts of the so-called thermo-couple, a voltage is present between the open ends. For small temperature differences, the voltage is proportional to the temperature difference. Magnetic signals Most of the low-cost magnetic sensors are based on the Hall effect. When a magnetic field is applied to a conductor, in which a current flows, a voltage difference over the conductor results in a direction perpendicular to the current and the magnetic field. Because this effect is quite substantial in semi-conductors, semi-conductor Hall-plates are low-cost and used in many commercially devices. Many materials change their resistivity upon application of a magnetic field. This so-called magneto-resistance effect can be exploited also for building magnetic sensors. Electrical signals Many phenomena interesting for being processed by an information system are electrical by nature: for instance biomedical signals (EEG, ECG); radio signals and many more. The equipment needed to convert, for instance, the tiny potential differences occurring at EEG electrodes to larger voltages, at a different impedance level, without interference from surrounding electromeagnetic fields, while avoiding the risk of the patient's electrocution, forms a special category of sensors. Chemical signals For monitoring the environment, the measurement of specific components within gas mixtures is necessary. This motivated strongly research into miniature low-cost (and possibly disposable) chemical sensors. The chemical signal can be directly converted to an electrical signal or first converted into an optical, mechanical or thermal signal, which is then converted into an electrical signal. As an example, a sensor can be built for measuring the CO concentration in air, by determining the attenuation of an infrared beam. As CO absorbs IR light, the attenuation is a measure for the concentration. Many chemical sensors are based on the measurement of the change of the conductivity or the dielectric constant of a chemical when it is exposed to a gas or electrolyte. Such a material can be a metal oxide. For instance, the electrical conductivity of tin dioxide changes with the concentration of methane when heated. In this way a sensor for the presence of gas can be built. Also many organic materials, when exposed to a gas, change their conductivity. However, since the conductivity of these materials is very low, they are hard to use. Chemical sensors exists for the measurement of many gases such as carbon monoxide (CO), carbon dioxide (CO2), oxygen 17

(O2) and ozone (O3). Also sensors for the humidity and acidity (pH) belong to this type. A disadvantage of most chemical sensors is that they are not only sensitive to one chemical measurand but usually respond to many, which makes it necessary to use these sensors under well-defined conditions. An important class of chemical sensors are biosensors. One type of biosensor is the acoustic biosensor [2.4]. In such a sensor a vibrating quartz-crystal is is coated with with a biochemical, which is specific for a the matter to be detected. When this coating, such as an antibody, binds to the matter to be measured, such as an antigen, the mass of the coating increases. This leads to a change in the resonance-frequency of the quartz-crystal, as the resonance frequency is directly related to the mass. The frequency change can be measured accurately. In this way a sensitive and specific sensor can be realized.

2.5 Image sensors The requirements with respect to absolute accuracy are for image sensors in general less than for measurement sensors. The information is however much more complex. Image sensors are built as an array of brightness sensors, which are electronically scanned. The spatial extension of the elementary sensors should be small, so as to avoid overlap. In fact there are three places in an imaging system where scanning can take place: in the illumination, the object / sensor positioning and in the sensor itself. A scene can be scanned with a single light beam while the reflected light is measured using a single elementary brightness sensor. This technique is applied in laser scanners, to obtain both a brightness image and an image representing the distance to objects. This may involve slow mechanical scanning and is in general expensive. Scanning can also be obtained by moving the sensor over the scene. This method is applied in flat-bed scanners, where a line image sensor is combined with onedimensional mechanical motion to access successive lines of the image. In an array image sensor the two-dimensional scanning process is completely electronic. A good design of a vision system involves an optimal choice of illumination, optical system and sensor in relation to the material properties of the object or the scene to be measured [2.5, 2.6, 2.7]. When human perception is involved, the fact that physical measured wavelength and perceived colour are quite different things should be taken into account. We will now briefly discuss colour perception and then in some more detail video systems because these, together with flat-bed scanners, are the most frequently used image input devices for information systems.

18

2.5.1 Colour perception In general we have to do with light, which is composed of a mixture of different wavelengths. The question is how we perceive such a mixture. We can imagine that the human eye has three types of colour receptors, all with their own spectral response. One type has its maximum sensitivity for red light, one for blue and one for green. Depending upon the spectral composition and intensity of the incoming light, these three receptors are stimulated, leading to the perception of a certain colour. Psychophysical evidence shows that every perceivable colour may be generated by combining red, green and blue lights. However these combinations are not unique for a given colour. Spectral compositions giving rise to the same colour are called metameres. Hence for reproducing a perceived colour, it is not necessary to reproduce the original mixture. It is sufficient that the receptors are stimulated by a metamere of the original mixture.

2.5.2 Video norms Video systems originate from the entertainment industry. This market has set the standards for video systems: the American EIA norm and the European CCIR norm. In the EIA norm a video image consists of 525 lines, with 30 image-frames/second. In the CCIR norm a video image consists of 625 lines, with 25 image-frames/second. To prevent flickering of the image during display due to the relatively small repetition frequency of the images, video systems are interlaced. This means that an image is split into two fields: one consisting of the odd image lines and and the other consisting of the even image lines. So the lines of one field are displayed in between the line of the previous field and the resulting local repetition frequency is the image is twice as high as when the complete frame was displayed.

1 3 5

623 625 lines of the first (odd) field

0 2 4

622 624 lines of the second (even) field

Figure 2-5 CCIR video system. Interlacing (a) and effective scanned area (b).

19

This is illustrated in fig. 2.5a for the CCIR norm (with fields of 312.5 lines). The video signal represent the brightness of the image along the lines of the fields and contains also synchronization pulses,which indicate the beginning of a line and a field. These synchronization pulses take also their time in the video signal (the so-called retrace time, which is also needed for the display device to position the writing beam at the beginning of the next line). This results in a smaller effective scanned area, than would be expected from the given number of lines and times. This effective area where real image data is transmitted is shown in figure 2.5b and is 74% of the total time (and so of the area).

2.5.3 Solid state video sensors A video sensor has three important functions: - light to charge conversion, - spatial accumulation of charge carriers, - signal reading. A solid state video sensor consists of an array of photo-sensitive sites. Charges are created by the photoelectric effect which frees electrons as a result of the illumination. The amount of charge accumulated at a photo site is a linear function of the local incident illumination and of the integration time. The scanning and signal reading is based on the principle of ChargeCoupled Devices (CCD), basically analogue shift registers. Small amounts of electrical charge called 'packets' are stored at specific locations in the silicon semiconductor material. These locations, called storage elements, are created by the field of a pair of gate electrodes close to the surface. By placing the storage elements close together with some overlap between adjacent elements, a charge packet can pass from one storage element to another. This transfer of a packet is realized by alternately raising and lowering the voltage on adjacent gate electrodes. In figure 2.6 the lay-out is sketched of a popular CCD solid state video sensor using the frame transfer method. This sensor is divided into an image section and a storage section [2.15]. The accumulation of charges takes place in vertical CCD registers and the charges are transfered to the storage section (dashed) during the vertical retrace time. Then the accumulation starts again at the photo sites while the storage section is read out line by line. The storage section is shifted into the horizontal read-out registers line by line (lower section) from which after amplification and adding the synchronization pulses the video signal is obtained.

20

Figure 2-6 Layout of a frame transfer solid state sensor

The spectral response of a solid state sensor peaks around 800 nm. The decrease for shorter wavelengths is a result of the transmission properties of the electrodes covering the sensor. The decrease for longer wavelengths results from the deeper penetration of the infra-red photons into the silicon. This gives rise to charge carriers in the substrate not contributing to the charge collection of the photo-sites. Another effect of the longer travel of infra-red photons is a decrease of the resolution for longer wave lengths. Charge carriers can result from incident illumination of neighbouring sites. This effect is reduced by the use of an infra-red blocking filter. In a solid state sensor the spatial accumulation of charges is separated from the signal readout. This allows the possibility to use a accumulation time different from the read-out time. In the high-speed shutter option the accumulation time is reduced. This makes the sensor less sensitive, but because of the short accumulation time the motion blur can be considerable reduced. For example the water-drops of a fall become visible. We can also do the opposite: enlarge the accumulation time. This makes the sensor more sensitive and useful in bad illumination conditions. This enlargement is however limited by thermal noise. Therefore in low-light applications cooled solid state sensors are sometimes applied. Temperature is an important factor. The storage-related parameters degrade rapidly at temperatures above 70° C (thermal relaxation). In video-cameras for the consumer market, single-chip colour sensors are realized by glueing a colour filter on-chip. This reduces the resolution of the sensor by a factor of 3. When colour is not important, a black and white camera gives the highest resolution for the same price! In professional video-cameras three solid state sensors are used for the three primary colours, and there is no reduction in resolution.

21

Solid-state cameras have no distortion of the picture geometry, nor burn-in or lag. However, when a very bright spot is present in the image, the CCD registers onto which this spot is projected saturate and bright columns appear in the image. Solid state sensors are small, light weight and mechanically rugged. The lowest light conditions of consumer cameras require around 1 - 3 lux. Resolution: video sensors developed for the consumer video market have sizes around 600 x 576 pixels. The organization and set-up of the array sensors is largely determined of course by the video norms for this market. Also special array sensors for image processing applications are available. An example is the Megaplus camera [2.16] with square pixels and a resolution of 1340 x 1037 pixels (Megaplus is a trademark of the Videk Company). Signal to noise ratio: This depends upon the illumination but ranges from 50 dB up to 64 dB in commercial devices. Besides these common properties the following properties are also found in specifications of solid state sensors: Total Photo Response Non-Uniformity (PRNU): The difference of the response levels between the most and least sensitive elements under uniform illumination. Picture element defects: The number of defective photo sites in the sensor. In a consumer video solid state sensor 604 columns x 575 lines or in total 350,000 photo-sites are present. At this moment array sensors with less than 10 defects in an image are commercially available. 2.5.4 Video digitizers (frame-grabbers) A video signal has to be digitized by a video digitizer before it can be processed by a computer. Several commercial video digitizers exist to input a video signal into a computer system. Digitizing an image frame of a CCIR video signal takes 40 ms. A sample frequency of 14.8 MHz is necessary to obtain square pixels (picture elements) in the CCIR system. Because of the retrace time the effective scanning area (CCIR) is 768 pixels on a line and 576 lines in an image (for square pixels). This is illustrated in figure 2-5b. The line period of the CCIR system is 64 µs (15625 Hz) of which 51.7 µs is the horizontal scan time and 12.3 µs is the retrace time. Three main functions are present in a video digitizer: A/D conversion, synchronization and image storage. The video digitizer converts an analog video signal into digital values. The number of bits required depends on the signal-to-noise ratio of the image sensor. This ratio depends among other things upon the illumination and is in the order of 50 dB. This corresponds to the 8 bits present in most commercial video digitizers. The synchronization of the sampling instants of the video digitizer with the scanning of the video source is one of the most crucial parts of a video digitizer. When the video source is freerunning, the video digitizer has to adjust its sample clock to the external source, so that a fixed

22

number of sample points fall into each line (defined by the interval between two linesynchronization pulses in the video signal). When we want square pixels in the digital image CCIR norm video signal, there must be 956 pixels on a line. This means that the sample clock cannot be fixed but must be adjusted to the video signal. In particular, when the video source is a video recorder, line and frame frequency may vary considerably and such an adjustment is essential. When the (solid state) sensor device delivers not only a video signal but also its pixel (scan) clock, the A/D conversion can take place completely synchronously with the scanning of the photo sites, and each sample point in the digital image corresponds in that case to one photo site in the solid state sensor. The digitized image is stored in a video memory in the video digitizer. Often this video memory can also be displayed. When the video digitizer logically resides on a processor bus, the video memory may be mapped into the working space of the processor. Image processing may take place on this stored image in the video memory. However, the peculiarities of such a video memory have then to be taken into account in all image processing routines. There are also video digitizers which use an interface bus to the computer system to transfer (parts of) images to the processor memory. It is good to make a clear distinction between square pixels (photo sites) of a solid state sensor and square pixels in a digital image. Square pixels of a solid state sensor are a result of the geometry of the lay-out of the photo site. The image values of the photo sites constitute the video signal at the rate of the pixel scan clock in the sensor. Square pixels in the digital image result from the fixed number of sample moments between two successive line-pulses in the video signal, as defined by the rate of the sample clock in the video digitizer. Only when the scan clock rate in the sensor is the same as the sample clock rate in the video digitizer there exists a one-to-one relationship between a photo site in the sensor and a pixel in the digital image, and only then a sensor with square 'pixels' will produce square pixels in the digital image! For these sensors, besides the video signal also the pixel-scan clock is needed for the video digitizer. When we have no pixel-clock connection, the sampling clock of the video digitizer defines the length of the pixels. So when this clock rate is 14.8 Mhz we have square pixels in the digital image, even when the photo sites have a rectangular size. In this case there occurs a re-quantization of the photo sites. When for instance the photo sites are larger than the pixel length defined by the digitizer clock, some photo sites are sampled twice and some are only sampled once. As the video signal passes in general a low-pass filter within the solid state sensor, an interpolation takes place with requantization as a result.

23

CCD photosites position

analog video signal

time

sampled video signal

time

digital image photo sites compared to digital image

Figure 2-7 Requantization due to different pixel and sample clocks

2.5.5 Scanners Scanners are used for digitizing photographs, drawings and hand-written or printed text. In the commonly used flat-bed scanners, the image is electronically scanned across its width by a linear CCD array containing some 2500 photo-sites (for a 300 dpi A4 scanner). Scanning in the other direction is done by moving the CCD array slowly parallel to itself underneath the glass plate on top of which the original image is put down. Typically the resolution of a desktop scanner is 300-400 dots per inch (dpi). Sometimes a scanner of this type has a provision for higher resolution (up to 1600 dpi); this is however artificially created by an interpolation algorithm, which evidently cannot increase the amount of information obtained at the basic resolution of the scanner. Colour scanners usually use three colour filters in combination with a single CCD array. The original is scanned three times in this case, once for each colour. The resolution of desktop scanners is well matched to the capabilities of other 'desktop publishing' equipment. Laserprinters with a basic resolution of 600 dpi can produce halftone images only by 'dithering' techniques at a resolution limited to effectively 150 dpi. For the dithering calculation, two to four times as many input dots are needed. Also, a postcard scanned at 300 dpi with 24 bits colour gives rise to 6 Mbytes of data, which is just a manageable amount in terms of disk space and computation time, on today's workstations. As discussed in section 2.5.3, an economy-class CCD array has a signal-to-noise ratio of 50 dB, corresponding to 8 bits of information per sample. This is sufficient for most desk-top applications where 8 bits gray-values or 24 bits colour is standard. For high-quality reproduction work in the printing industry, a resolution of 400 dpi is insufficient; here flatbed or drum scanners are used with resolutions up to 3000 dpi. One reason for using so high a resolution is that in the reproduction printing process the rasters corresponding to the various colours have to be shifted and rotated mutually to prevent

24

smearing and the appearance of moiré patterns. In order to make the scanner's high spatial resolution effective, the dynamic range of the colour channels should be increased as well. In industrial flatbed scanners, the use of selected CCD arrays and a very stable mechanical construction leads to 10-12 bits per colour channel. It is desirable to use a CCD array whose individual photosites are equal within a quantization step; otherwise unwanted parallel lines will appear in the scanner's (virtual) output image. Unfortunately this is difficult to obtain in high-resolution scanners. One can cope with this problem by a software calibration process based on the signal resulting from scanning a test image of uniform density. In industrial drum scanners, this problem is avoided. The image is scanned by a laser beam illuminating a rotating drum on which the picture is mounted. Photomultipliers are used for measuring the reflected light with a dynamic range of 120 dB corresponding to 20 bits per colour channel.

2.6 Mechanical signals 2.6.1 Position sensors Several physical principles are exploited to create position sensors. In table 2.3 the most popular sensors are listed, which will be discussed briefly. For a more extensive discussion see for instance Reijers et al. [2.3]. The LVDT (Linear Variable Differential Transformer) and the resolver are based upon the principle of electromagnetic induction. With the LVDT linear displacements can be measured. A core is moved within a special transformer of which the output voltage varies linearly with the position of the core. With a resolver angular rotations can be measured of the rotary shaft on which it is mounted. Stator and rotor windings of the resolver are driven by a two-phase clock. The phase between the stator and rotor signal is measured and converted to an angular position. An eddy current sensor is also based upon the inductive principle and is used for contactless measurement of the distance to a conducting object. It induces currents in a nearby conductor, which results in energy losses. This effectively reduces the sensor impedance, which varies almost linearly with the distance to the conductor. The effective range is short, about 10 millimeters. Although the accuracy is about 0.1% , differential motions of 0.03 mm can easily be detected.

25

sensor

principle

range

accuracy

LVDT

inductive

1 mm - 30 cm

0.25 %

contact / non-contact contact

resolver

inductive

360˚

0.3˚

contact

eddy current

inductive

0.1 mm- 6 cm

0.5%

non-contact

LVDC

capacitive

2.5 mm- 25 cm

0.01 %

contact

strain gage

resistive

length x 10-6

depend on electronics

contact

ultra-sound

acoustical

30 cm- 10 m

absolute encoder

optical

360˚

> 0.3˚

contact

PSD

optical

depends on optical system

0.01%

non-contact

non-contact

remarks

only for conducting objects

mostly used for force measurement accuracy is temperature dependent

both in 1D and 2D

Table 2.3. Position sensors

The LVDC (Linear Variable Differential Capacitor) has some resemblance to the LVDT but uses a capacitive method. As small differences in capacitance are difficult to measure, the LDVC requires careful mechanical design and expensive electronics. Strain gauges are mostly used for force measurement. Gauges are made of electrical conductors, usually thin wire or foil, bonded to the beam or other object whose strain is being measured (strain - mechanical deformation - is, within the limits of elastic behaviour of the beam, linearly related to stress, or applied force). The resistance of the gauges varies with its deformation and so with the beam's strain. As the deformation is usually small, the change in resistance is small as well. Application of strain gauges is a highly skilled art. Gauges must be bonded to a clean surface with the proper type of cement. They must be aligned properly and temperature compensated. Ultrasonic distance sensors are based upon the time-of-flight principle. An ultrasonic impulse is sent, and the time it takes before the reflected sound is received again by the transducer (in the mean time switched over to receiving mode) is a measure for the distance. This distance is computed by dividing the velocity of sound in air by two times the measured time interval. Because the sound velocity in air is temperature dependent, changes in temperature influence the measurement. Very cheap distance measurements can be realised in this way. The Polaroid company was the first to use this method for the distance measurement in its cameras.

26

Absolute encoders are high-precision rotary devices that are mounted on a shaft of a rotary drive like a resolver. They encode the angular position by a binary code. This code is read from one or more discs with concentric rings of photographed or etched codes. In figure 2.8 this principle is illustrated for 16 positions with 4 code rings. A large encoder may have 10 to 20 rings and is quite expensive. Cheaper solutions can be found with incremental encoders by counting the number of steps. However, in this case no absolute position is obtained.

Figure 2.8 Absolute encoder for 16 positions in binary and Gray code

Position sensitive devices (PSDs). The position of an illuminating lightbeam can be calculated with a PSD. In the one-dimensional configuration illustrated in figure 2.9, a PSD consists of a rectangular (e.g. 34 x 2.5 mm2) diode. The backside of the diode is fully metalized and forms the return electrode. The frontside is the light-sensitive side with two contacts A and B. When a lightbeam hits the device a current is generated by the photo electric effect. This current is split into two currents ia and ib to contact A and B. Now the PSD has been manufactured to realize an extremely constant surface resistance of the layer (1%). So the resistors Ra and Rb are proportional the length a and b :

27

incident light beam D

electrode A

b

a

electrode B

Iy1

intrinsic silicon Ix2

Ix1 (x,y) electrode A

electrode B

light sensitive area

Iy2 Area PSD

Line PSD

Figure 2.9 Line and area PSD's

ia Rb b ( D - a ) = ib Ra = a = a

ib - ia 2a and so: ib + ia = D - 1.

Thus, (ib - ia) / (ib + ia) is proportional to a. In general the light beam has a certain diameter. The output of the PSD represents in that case the centre of gravity of the beam. The spectral response of a PSD ranges from 400 nm (blue) to 1000 nm (infrared) with a peak at 900 nm. The sensitivity is around 0.6 A/watt. The resolution obtainable with a PSD is determined by the noise in the signals ia and ib. Accuracies attainable are in the range of 1 : 104. The influence of dark current and environmental light can be largely reduced by the use of pulsed light.

2.6.2 Scanning principles for distance images 1

The methods to obtain range or distance images (also called 22 D images) are mainly based on triangulation. There are two approaches: active and passive. In the active approach a scanning light source is used in combination with an imaging system. In the passive approach two imaging systems are used (stereo vision) or a moving sensor system (multiple view technique). In particular, a multiple view method can give complete 3D images. In the stereo vision type of approach, the distances to scene points are calculated from the displacements in both images from known (identified) scene points. Problems arise in the matching because of occlusion and multiple matches. Also from the shading from the image or the texture an estimation of the shape can be obtained ('shape from' techniques). In the following we will restrict ourselves to active techniques and leave further discussion of passive techniques to courses on image processing.

28

position detector

laser

laser beam

lens object

Figure 2.10 Principle of triangulation. A difference in distance results in a displacement in the sensor image

When a scene is illuminated by a small lightbeam, the distance to the illuminated scene-element can be calculated by triangulation. Only the position of the illuminated element is necessary to calculate the distance, so both image sensors and PSD's may be used to calculate this distance. A range image can be obtained by a complete scan of the scene with a light beam. One dimension of the scanning can be present in the movement of the object (or system). In that case only a one-dimensional range profile has to be calculated in a plane intersection across the scene, perpendicular to the direction of motion. This total image is obtained by combining the range profiles of the successive object positions. When no movement of the object is present an additional motion of the whole sensor system is necessary, which is mechanical. Commercial systems based on these principles are available, but are in general slow. 2.6.3 Sound A microphone responds to acoustical signals coming from different directions, but usually the sensitivity is direction-dependent. The sensitivity of a microphone depends also upon the frequency of the acoustical signal. In figure 2.12b the sensitivity of a common electret microphone is sketched. An electret microphone is based upon the following principle: a vibrating foil forms a capacitor with an second plate, and the varying capacitance due to the vibration of the foil is translated into a changing voltage because a permanent electrical charge is present, caused by an electret mounted on the fixed plate. (An electret is made by heating a dielectric material and then suddenly cool it in the presence of a strong electric field). The characteristic of the sensitivity of this microphone as function of frequency is flat from 100 Hz to 5 kHz, and it decreases for higher and lower frequencies (with a peak for this specific microphone at 10 kHz). The frequency range of a microphone should for most applications cover the range of human hearing (20 Hz - 20 kHz) .

29

0°

90°

-90° -10 dB -5 dB 0 dB

180° sensiti vity in dB

10

100

1 kHz

10 kHz

frequency in Hz

Figure 2.12 Sensitivity of an electret microphone as function of the direction angle (top) and of the frequency of the acoustical signal (bottom)

2.6.4 Compression and expansion In speech, small amplitudes ar much more frequent then large amplitudes. Hence if the microphone signal is uniformly quantized (using equal distance Δv between the quantization levels of the ADC), many levels are seldomly used. This can be improved either by choosing a non-uniform quantization level distribution (resulting in a higher density of levels in the lower part of the ADC's input range), or by compressing the input signal prior to sampling. The latter is often done (for instance in digital telephone systems) according to the so-called µ-law: y = log(1 + µ.x) / log(1 + µ) where x is the (positive) input voltage. Both x and y are normalized over the range (0,1) in this simplified version. With an 8-bits ADC and choosing µ = 255, the ADC's output SNR is roughly constant over a 40 dB range. For restoring the original signal, an expander is needed based on the inverse function. A comparable compression scheme is based on the a-law. Evidently, these companding techniques are not restricted to be used with microphone signals only.

30

2.7 Standards for measurement systems For many years instrument manufactures have worked to standardize the electrical and mechanical interface between instruments and computers. A well-known example is the HP-IP or IEEE-488 bus [2.22]. There are many stand-alone measuring instruments on the market equipped with an IEEE-488 interface. Connecting the instruments to a workstation and controlling the instrument through this interface makes its easy to set up an automated measuring system. There are also cards (for instance audio and video digitizers of different vendors) for internal computer buses of widespread systems and workstations such as IBM compatibles (ISA- or AT-bus), SUN workstations (S-bus) and systems using the VME-bus. Although the electrical and mechanical interface between instruments and computer systems are standardized, the messages over the interface are not. As a result many different command sets were developed not only for different measuring instruments but even for the same type of measuring instrument from different vendors. Recently, Standard Commands for Programmable Instrumentation (SCPI) have been defined and are adopted by an increasing number of instrument manufacturers [2.23]. The question arises whether a SCPI instrument from one vendor can be replaced by an instrument of another vendor. Unfortunately, complete interchangeability cannot be guaranteed as of today. Nevertheless, SCPI provides a high degree of consistency among instruments. The command to measure a frequency is the same whether the measurement is made by an oscilloscope or a counter. A fundamental objective of SCPI is to provide a simple way to perform simple operations. The MEASure command is the easiest way to configure and read data from an instrument. When the program message (of which the small letters are optional) :MEASure:VOLTage:AC? is received by a voltmeter, the meter will select settings and configure itself for an AC voltage measurement, initiate the measurement and return the result to the system controller. A user can specify characteristics of the signal measurement, such as expected signal value or the resolution of the measurement, by adding parameters to the command. For example : :MEASure:VOLTage:AC? 20, 0.001 instructs the meter to configure itself to make AC measurement on a signal of around 20 volts with 0.001 volts resolution. To provide direct control over an instrument's hardware, SCPI contains command subsystems that control particular instruments and settings. To define the commands used to provide this control, SCPI uses a generalized model of a programmable instrument shown in figure 2-13. The model defines where elements of the language must be assigned in the SCPI hierarchy. Major areas of signal functionality are shown as blocks. The signal routing block takes care of the routing of signals between an instrument's port and its internal signal functionality.

31

signal in

signal routing

measurement function

trigger

signal out

signal routing

memory

signal generation

format

to bus

format

from bus

display

Figure 2-13. Generalized Instrument Model of SCPI

The measurement function block converts a physical signal into an internal data form that is available for formatting into bus data. It may perform the additional tasks of signal conditioning and post-conversion calculation. The signal generation block is responsible for conversion of data into physical signals. It may perform additional tasks of preconversion calculation and signal conditioning. The purpose of the trigger block is to provide an instrument with synchronization capability with external events. The purpose of the memory block is to hold data inside the instrument. While every programmable instrument contains memory, not all such instruments provide explicit control of this memory. The format block converts between data representations, especially on the data that is transferred over the external interface. An example is conversion of internal data formats into ASCII. The purpose of the display block is to control the display of the signals if the instrument is equipped for that purpose. SCPI is a major advance in providing a standard instrument vocabulary and provides to the user shorter programming time, better understandable and maintainable programs and has greatly increased likelihood of instrument interchangeability.

2.8 Virtual sensors Often we won't be dealing with one single sensor, but with a more complex system, incorporating not only the sensor but also the processing of the sensor data and the control of sensor parameters. This makes it hard to define where in a complex system the sensing ends and the processing begins. This point can be illustrated with a distance image. In a distance image the values of the pixels give the distance to the closest object. It can be obtained with an acoustical imaging system, measuring the time-of-flight of a reflected sound pulse: a complex sensing system without complex processing. It can also be obtained with stereo vision techniques where we have two images of a scene from different viewpoints. From the disparities between the objects in the images the distance can be calculated. In this case substantial processing is required to 32

find the disparities between the images and to convert these to distances. A third technique is based on a structured illumination of the scene, where both processing of the video image and active control of the illumination is required to obtain a distance image. Is the first approach a sensor and the others not ? Should it depend upon the definition of what is measured whether we call the system a sensor or not ? 2.8.1 Sensor model A solution to model more complex sensing systems is the definition of virtual (or logical) sensors. A system for measuring a certain property is called a virtual sensor. A virtual sensor can be identical to the traditional sensor, converting a simple physical quantity into an electrical signal (and therefore sometimes called transducer), but it can also be some complex processing routine. This approach opens the way to create many different virtual sensors through combinations of others, leading to a flexible and modular sensor system structure. An important aspect of such a sensor model is that it is capable to handle the robustness of the system. When a human interprets a scene, he always has expectations about what may be present in the scene, what sort of objects could physically exist, and which are the constraints set by the laws of physics, to mention a few aspects. A sensor (processing) model should be capable to deal with expectations and uncertainty in the measurements. Obtaining reliable and robust sensor data interpretation is of great importance since erroneous interpretation of the sensor data may lead to unwanted actions in (autonomous) information systems. It is therefore a major area for study. Robustness can be obtained by measuring the desired quantity in different ways or with different sensors. By judging the consistency of the results, errors can be detected. With statistical techniques the different measurements can be combined to obtain a best estimate of the quantity to be measured or to reject outliers in the measurements. When not a complete different measuring strategy is used and for instance the same sensor is used to repeat the measurement , the same error or exception can be present and will not be detected from the results alone. This is possible however, when we use a priori knowledge of what to expect. Within this virtual sensor concept a mechanism has been incorporated to handle an 'erroneous' input to a virtual sensor. Every input is judged by an 'acceptance test' which result in the acceptance or rejection of the input . In case of rejection, the virtual sensor has a list of alternative virtual sensors available that can provide the same input. From this list it picks the next virtual sensor which is then activated. A virtual sensor fails when its list of alternatives is exhausted.

33

2.9 References 2.1

K. Lion: Transducers: problems and prospects. IEEE Trans. Industr. Electron. & Control. Instrum., IECI-16 (1969) pp 2-5. 2.2 S. Middelhoek, S.A. Audet: Silicon Sensors. TUD, Department of Electrical Engineering Et 05-31. 2.3 L.N. Reijers, H.J.L.M. de Haas: Flexibele Produktie Automatisering, deel III Industriele robots. Technische Uitgeverij De Vey Mestdagh BV., Middelburg. 2.4 R. Schasfoort: Chemische sensorontwikkeling bij TNO. Sensornieuws, vol 2 (1993) pp 8-10. 2.5 A. Novini: Before you buy a Vision System... Manufacturing Engineering, vol.94 (1985) no 3, pp 42-48. 2.6 H.E. Schroeder: Practical illumination concept and technique for machine vision applications. Proc. Robots 8 (1984), pp 14-43. 2.7 R.A. Jarvis: A perspective on range finding techniques for computer vision. IEEE PAMI-5 (1983) pp 122-139. 2.8 H.R. Everett H.R.: Survey of collision avoidance and ranging sensors for mobile robots. Robotics and Autonomous Systems, 5 (1989). 2.9 S. Inokuchi, K. Sato, F. Mutsuda: Range-imaging system for 3-D object recognition. Proc. 7th Int. Conf. on Pattern Recognition (1984) pp 806-808. 2.11 Barnard S.T., Thompson W.B.: Disparity analysis of images, IEEE PAMI, Vol. PAMI-2, No. 4, July 1980, pp 333-340. 2.12 Fairchild: CCD. The solid State Imaging Technology. 2.15 Philips: The frame-transfer sensor an attractive alternative to the tv camera tube, Philips Technical Publication 150, 1985. 2.16 Videk: Megaplus camera: CCD Camera for high resolution applications. Videk, New York. 2.22 Jenssen K.: VXIbus: A new interconnection standard for modular instruments, Hewlett-Packard Journal, Vol 40, no.2, April 1989, pp.91-94. 2.23 Standard Commands for Programmable Instrument Manual, Version 1990.0, April 1990. 2.24 Owen Bishop: Practical Electronic Sensors. Bernard Babani, London 1991 2.25 R. Pallas-Areny, J.G. Webster: Sensors and Signal Conditioning. Wiley 1991.

34

3. Continuous-time signals and systems

As we have seen in the preceding chapters, a sensor is used for observing some physical quantity over an interval of time. Hence, using a caliper to measure the diameter of a Dutch fivecents coin wouldn't be called sensing. We know beforehand that we will find the same value (21 mm) once and for all, in other words, continuing the measurement doesn't provide any information. We speak of sensing only when we expect to find information-bearing changes in the quantity being observed. Changes in the observed value may also depend on other quantities than time. For instance, the gray value of a photograph is a function of the coordinates of the point where it is being measured. On the target of a CCD camera, the illumination of a photosite is a function of both position and time. The result of sensing is a signal: an information-conveying function of one or more independent variables. In the physical world, the independent variables (like time, position) are almost always continuous. Man-made signals sometimes have a discrete independent variable: the Dow Jones index is determined once every day and is undefined inbetween. The dependent variable (length, gray value) is often a continuously variable scalar quantity. Colour on the other hand is a vector-valued quantity (r,g,b); and a Morse signal has a discrete dependent variable (mark, space). The representation of a signal inside a digital computer is necessarily discrete in both the dependent and the independent variables. To be able to manipulate signals, we need a mathematical description. Confining ourselves to scalar continuous-time signals x = f(t), an obvious method of description is to specify x for every t. Unfortunately this can be done only if we know the analytic form of the function f(t) beforehand: for instance we might know that x = a.sin(bt). But we see immediately that this signal is a trivial one (though somewhat less trivial than the coin diameter 'signal'): once a and b have been determined, the signal is known for all time and no information is being conveyed. Real, information-bearing signals essentially have some degree of unpredictability. Even then however, such signals are subject to certain constraints. For instance, any signal has limited duration and, equally important, has only finite detail: an audio signal doesn't contain 'frequencies' beyond 15 kHz (we say that it is band limited - in fact, any signal from physical origin is band limited). This kind of restriction makes that a signal (and the message it conveys) can be described by a finite number of parameters. From information theory we know that this is a condition for information: the number of possible messages has to be finite otherwise their coding will be impossible. This chapter is devoted to the question how a given continuous-time signal can be parametrized. In chapter 4, the same topic is discussed for discrete-time signals. The theory can easily be generalized for other kinds of signals, like images.

35

3.1 Least-squares approximation of a function Let f(t) be given on the interval (t1,t2). In what follows we will assume generally that t denotes time. We want to approximate f(t) as closely as possible by c.ϕ(t). Here ϕ(t) is a different function defined on the same interval. The value of the constant c is chosen so as to minimize the integral-square or L2 norm t2 J(c) = ∫ (f(t)-c.ϕ(t))2dt. t1 We can calculate c by noting that in the minimum the derivative of the function J(c) should equal zero: dJ/dc = 0. This leads to t2 t2 c = ∫ f.ϕ dt / ∫ ϕ2 dt . t1 t1 t2 Example: ϕ(t) = 1 for t ∈ (t1,t2); arbitrary f(t). Then we have c = ∫ f(t)dt / (t2-t1) t1 or in words: c is the average of f(t) over the interval (t1,t2). 3.2 Orthogonal functions t2 f(t) and ϕ(t) are called orthogonal if their inner product (f,g) =

∫

f(t).ϕ(t)dt

is zero.

t1 Assume that we have a collection of mutually orthogonal functions { ϕi(t) }, i ∈ Z for which by definition holds that their innner product t2 ϕi(t).ϕj(t)dt = 0 if and only if i ≠ j .

∫

(3.1)

t1 ∞

This time we want to approximate f(t) by a linear combination (a weighted sum)

∑ci.ϕi(t) .

i=-∞

Again we determine the coefficients ci by minimizing the L2 norm t2 J (....,c1,c2,c3,....) =

∫

t1

∞ ( f(t) -

∑ci.ϕi(t) )2dt , i ∈ Z.

i=-∞

36

Thus, by putting ∂J/∂ci = 0 and using (1) we obtain for the i-th coefficient t2

t2 ci =

f(t)ϕi(t)dt

∫

/∫

ϕi2(t)dt

(3.2)

t1

t1

independent of all other coefficients! This is called the expansion of f(t) into a series of orthogonal basis functions {ϕi(t)}. It sometimes helps to gain insight in these matters if we consider the basis functions as mutually orthogonal vectors in an infinite-dimensional space (called Hilbert space). Then we can regard a given function to be represented by another vector, and its Fourier coefficients as the projections of this vector onto the basis vectors. The length of a Hilbert space vector associated with the function f(t) (always defined on the same interval (t1,t2)) is, like in standard geometry, the square-root of (f,f) =

t2

∫

|f(t)|2dt .

t1

3.3 Fourier series expansion of a function defined on a finite interval Consider a function f(t) on (t1,t2) with t2 - t1 = T. The set of basis functions {sin(nΩt), cos(nΩt)}, n ∈ N , Ω = 2π/T, (3.3) constitutes an orthogonal system on the interval (t1,t2), since their mutual inner products satisfy the following for all n,m ∈ N: (sin(nΩt), sin(mΩt)) = πδnm/Ω (cos(nΩt),cos(mΩt)) = πδnm/Ω (cos(nΩt), sin(mΩt)) = 0 where δnm = 1 if n = m, δnm = 0 if n ≠ m. If we have a function sin(Ωt) or cos(Ωt) we call Ω its (angular) frequency; it is measured in radians per second. Technicians commonly use the (period-) frequency ν = Ω/2π. This is the number of periods per second and is measured in Hertz (Hz). The representation of a function as a weighted sum of basis functions from the set (3.3) is called a Fourier series expansion. A Fourier series can be written down in various ways: ∞

f(t)

=

a0 +

∑{an.cos(nΩt) + bn.sin(nΩt)} n=1 ∞

=

∑{an.cos(nΩt) + bn.sin(nΩt)} n=0 ∞

=

a0 +

∑{cn.cos(nΩt + γn)} n=1

37

(3.4)

∞

=

a0 +

∑{dn.sin(nΩt + δn)} n=1 ∞

∑Fn.einΩt

=

n=-∞

with 1 a0 = T

2 an = T

t2

∫

t1 t2

∫

f(t) dt (average of the function f(t) )

2 f(t)cos(nΩt) dt , bn = T

t1

t2 f(t)sin(nΩt) dt (inner products).

∫

t1

The factor 2/T results from the fact that t2

∫

t2 sin2(nΩt)dt =

t1

∫

T cos2(nΩt)dt = 2

t1

(recall that t2 - t1 = T). It is not difficult to see that the quantities c, γ, d, δ and F can be derived from a and b, and vice versa. For instance F0 = a0, Fn = (an - ibn)/2, F-n = (an + i bn)/2, n ≥ 1. The original function f(t) can be continued periodically, and the same is true for f(t)'s Fourier series expansion. For the particular type of functions we are discussing here (defined on a finite interval or periodic) the Fourier series expansion embodies their frequency-domain representation. Sometimes Ω = 2π/T is called the fundamental frequency in f(t)'s Fourier series expansion. For n>0, nΩ is called the n-th harmonic of Ω. Figure 3-1 gives some examples of Fourier series expansions. 3.4 Energy and power; power spectrum t2 The energy of f(t), t ∈ (t1,t2) with t2 - t1 = T is E =

∫

|f(t)|2 dt ; its power is P = E/T.

t1

(power = energy per unit of time; units: watt = joule / seconde). The contribution to P of the Fourier component of f(t) with frequency nΩ = n.2π/T is t2 t2 1 2 1 1 Pn = T an cos2(nΩt) dt + T bn2 sin2(nΩt) dt = 2 (an2 + bn2 )

∫

t1

∫

t1

38

For the total power of f(t) the Parseval-relation holds: t2 ∞ 1 1 ∞ 2 Parseval 1 2 2 2 P = ∑ Pn = T f (t) dt = T (a0 + 2 ∑(an + bn )) .

∫

n=0

n=1

t1

This relation shows that the distribution of power over the sine- and cosine-components, which evidently depends on the choice of the time origin, has no effect on the total power. The sequence {Pn} is called the power spectrum of f(t). It is a discrete ('line-') spectrum, since it is defined only for discrete values of the frequency. Because the sine- and cosine-terms corresponding to any harmonic are put together in the power spectrum, phase information is lost, and it is not possible to reconstruct f(t) from its power spectrum.

a

b

c

d

e

f

g t=0

t=0

A 2A cos3Ωt cos5Ωt (cosΩt + - .............) 2 π 3 5 2A sin3Ωt sin5Ωt b) (sinΩt + + + .............) π 3 5 4A sin3Ωt sin5Ωt c) (sinΩt + - .............) 2 32 52 π 4A cos3Ωt cos5Ωt d) (cosΩt + + + .............) 2 32 52 π A sin2Ωt sin3Ωt e) (sinΩt + - .............) π 2 3 A sin2Ωt sin3Ωt f) - (sinΩt + + - .............) π 2 3 2 1 g) ( + cosΩt + cos2Ωt + cos3Ωt + .............) Τ 2 a)

with A top-to-top amplitude; T interval length; Ω = 2π/T

Figure 3-1 Examples of Fourier series expansions

39

3.5 Example: f(t) is an impulse In this example we consider an impulse of width 2Δ and height A = 1/(2Δ) (figure 3-2). We assume again that f(t) is given on a finite time interval of duration T. However, we could as well say that f(t) is a periodic function with period T. The Fourier series expansion is identical in both cases. In order to determine the Fourier-coefficients {an} we choose the time origin (t = 0) to be half way our impulse. f(t) is an even function in that case; it can be written as a cosine-series (i.e. in (1.4) the coefficients bn of the sine terms are all zero). Now we have, with Ω = 2π/T, t2 Δ 2 4A an = T f(t).cos(nΩt) dt = T cos(nΩt) dt = (3.5a)

∫

∫

t1 2 sin(nΩΔ) = T nΩΔ .

0 (3.5b)

We now consider two limiting cases, A and B. Case A: T is constant, Δ → 0 while the product A.Δ (i.e., the area under the impulse) remains constant = 1. The limit case impulse is often called the Dirac delta function, although it is not a function in the usual sense (in fact, it is a so-called distribution). In this case, the equation for an becomes an = 2/T in the limit, independent of n! In words: a δ-impulse function (periodic or on a finite interval) has a flat spectrum. This is illustrated in the last example in figure 3-1. To derive this we have used the well known fact that for x → 0, sin(x)/x → 1. The function sin(x)/x (figure 3-3) is so common in signal processing theory (and elsewhere) that it got its own name: sinc(x). Case B: Δ is constant, T → ∞ (a finite impulse defined on an infinite interval). What happens now is not so easy to see. We can describe it qualitatively as follows: if T is made larger and larger, the lines in the spectrum of our finite-width impulse will come ever closer to each other (because their distance is Ω = 1/T, cf. equation 3.5).

Figure 3-2 Impulse function

Figure 3-3 The function sin(x)/x

40

In the limit for T → ∞ the original line spectrum changes into a continuous spectrum F(Ω) of finite width which should be interpreted as a density function. That is, the contribution to the impulse's power by frequencies between Ω1 and Ω 2 is proportional to the area under F(Ω) between the given limits. In the present case, F(Ω) is a sinc function similar to the envelope of the discrete spectrum (3.5b) in the original example: F(Ω) = 2AΔ sinc (ΩΔ) = 2sinc (ΩΔ). F(Ω) is called the Fourier transform of our Δ-impulse.

(3.5c)

3.6 The continuous-time Fourier transformation (CTFT) The Fourier transformation introduced in the previous section is applicable if we are dealing with the frequency representation of functions defined on the entire time-axis (-∞ < t < ∞) and satisfying certain conditions, the most important of which is that they have finite energy. This means in fact that they are localized more or less on the time axis. Thus, a periodic function cannot be described by a Fourier transform, only by a Fourier series as we have seen before. The general form of the so-called Fourier transform pair is ∞ ∞ 1 f(t) = 2π F(Ω)e+iΩtdΩ with F(Ω) = f(t)e-iΩtdt (3.6)

∫

∫

-∞ -∞ F(Ω) is the frequency-domain representation of f(t), but we could say equally well that f(t) is the time-domain representation of F(Ω). We will often indicate a Fourier pair by the notation f(t) ↔ F(Ω). It is possible (but not completely trivial) to show that (3.6) is valid by substituting the second expression in the first one; evaluating the resulting integral results then in an identity f(t) = f(t). It should be noted that either member of a Fourier pair can be a complex function. If F(Ω) is complex it is often useful to write it as the product of a phase factor and a real modulus or amplitude: ImF(Ω) F(Ω) = eiϕ(Ω).A(Ω), with A(Ω) = |F(Ω)| , ϕ(Ω) = arctgReF(Ω) (3.7) Frequency-domain representation is often the obvious way of describing a signal. For instance, the sound emitted by a bowed violin string is characterized by the fact that it contains a certain fundamental frequency (say, 440 Hz) plus a number of harmonics (880 Hz, 1320 Hz,...) the relative intensities of which account for the typical timbre of the violin tone. A time-domain description of this phenomenon would be very clumsy. Another point of view is that the highfrequency components in the Fourier transform are responsible for the small (time-domain) details of a signal.

41

3.7 The uncertainty principle In section 3.5 we found the expression (3.5c) for the Fourier transform of an impulse of finite width Δ. For Δ → 0, the impulse becomes a Dirac-delta, and F(Ω) becomes a constant, independent of Ω (i.e., we have a 'flat' spectrum). Conversely it is clear, both from the concept of the Fourier transform and from the symmetry of (3.6), that a Dirac-delta δ(Ω-Ω0) in the frequency domain corresponds to a sinusoid of frequency Ω0 and infinite duration. As it appears, an f(t) with small width is always accompanied by a wide F(Ω) and vice versa. This suggest that there is a fundamental relationship between the widths δt and δΩ of both members of a Fourier pair. A problem here is that there is no obvious meaningful and at the same time sufficiently general definition of 'width'. A common width measure is the standard deviation f2(t) or F2(Ω), the square being used because the standard deviation is only meaningful for a non-negative function (like a probability distribution). For arbitrary Fourier pairs it can be shown that the product δt.δΩ is al least of order 2π, or when expressed in terms of the 'period' frequency ν: δt.δν is at least of order 1. This fundamental property of the Fourier transformation has many consequences in practical signal processing, some of which we will discuss later on. Example: Tone discrimination We consider the problem of discriminating two nearby tones (for instance 440 Hz and 442 Hz) simultaneously present in an audio signal. This problem is important, for instance, in Doppler radar used for traffic speed surveillance. The uncertainty relation (section 3.7) tells us that a signal of finite duration (or one being observed for a finite time) necessarily has a Fourier spectrum of finite width: δt.δν > 1. Thus, as we have a frequency difference of δν = 2 Hz in the given example, we should observe the signal at least for a time of order δt = 1/2 = 0.5 s, or about 220 cycles at the average frequency 441 Hz. This value is a lower bound to the observation time δt which often has to be exceeded appreciably in order to get useful results.

3.8 Some properties of the Fourier transform Symmetry properties: if f(t) is a real function, than F(Ω) = F*(-Ω) hence |F(Ω)|2 = |F(-Ω)|2; if f(t) is even, than so is F(Ω): F(Ω) = F(-Ω); if f(t) is odd, than so is F(Ω): F(Ω) = - F(-Ω); if f(t) is real and even, than so is F(Ω) (cosine transform).

(3.8a)

Linearity: a.f(t) + b.g(t) ↔ a.F(Ω) + b.G(Ω).

(3.8b)

Time-shift property: f(t - t0) ↔

(3.8c)

exp(- i t0Ω).F(Ω) = exp(i ϕ(Ω) - i t0Ω).A(Ω).

Frequency-shift property (modulation-property): F(Ω - Ω0) ↔ exp(+ i Ω0t).f(t). 42

(3.8d)

Transform of a derivative: dnf(t) ↔ (iΩ)n F(Ω). dtn

(3.8e)

3.9 Linear Time Invariant (LTI) systems A system S is a black box in which a signal x(t) enters and from which a signal y(t) leaves which is modified version of x(t): y(t) = S{x(t)}. S is an LTI system if and only if the following conditions are met: y(t + T) = S{x(t + T)} for any T (time-invariance); if y1 = S{x1} and y2 = S{x2} then S{ x1 + x2 } = y1 + y2 (additivity); if y = S{x} then S{a.x} = a.y for any constant a (scaling invariance).

(3.9a) (3.9b) (3.9c)

A system's output at any moment depends both on the current input and on the system's state (its memory). If the state is represented by a vector of n independent quantities, we say that we are dealing with an n-th order system. Such a system can be described by either a system of n first-order differential equations, or by a single n-th order differential equation. Both tell how the state evolves from its current value under the influence of the input signal (although the state vector may not always be explicitly visible in the equation). If the output signal y isn't simply equal to one of the state components, a separate (algebraic, not differential) output equation is needed.

3.10 Example of an LTI system: a low-pass filter Consider a particular LTI system S for which the relation between input signal x and output signal y is described by the first-order linear differential equation . τ y + y(t) = x(t) (3.10a) or equivalently . 1 y = τ (x - y ) . . In these equations y is an abbreviation of dy(t) . dt

(3.10b)

Equation (3.10b) shows how the change in state variable y depends on the current values of x(t) and y(t). In this case, the output variable y happens to be the same as the state variable. The parameter τ in equation (3.10) representents the system's characteristic time.

43

Which output signal y(t) is produced by S in response to some input signal x(t)? To answer this question we calculate first the response to a sinusoidal input signal with frequency Ω and real amplitude X x(t) = X.eiΩ t By substitution we can verify easily that the output signal has the form y(t) = Y.eiΩ t where Y is a complex quantity in general, which can be written as Y = |Y|.eiϕ. If we consider a second-order system putting

d2 y(t) dy(t) + a. dt dt2

+ b.y(t) = x(t) we can expose the state vector by

dy(t) = z(t). Then we obtain a system of two first-order equations dt dy(t) dt = z(t) dz(t) = - a.z(t) - b.y(t) + x(t) dt

which clearly shows how the rate of change (i.e., the derivative) of the state vector (y,z) depends on the current state and the current input x. In this case too, the output y is identical to one of the components of the system's state.

In words: the system's output signal is a sinusoid of the same frequency as the input signal, although its phase and amplitude may be different. This is true for any LTI system. (Sinusoids are therefore called the eigenfunctions of LTI systems). Some calculation shows that for system (3.10) the relation between Y and X is |Y| =

|X| 1 + Ω2 τ2

; ϕ = arctg( - Ωτ ).

(3.10c)

The first expression in (3.10c) shows why this LTI system behaves as a low-pass filter: while for Ω = 0 the input and output amplitudes |X | and |Y | are equal, their ratio |Y|/|X| becomes ever smaller if Ω is increased. Formula (3.10c) gives the frequency response function of system S in terms of an amplitude- and a phase-response function. Alternatively, these can be combined in a single complex frequency response function Y(Ω) 1 H(Ω) = X(Ω) = (1 + i Ω τ )

.

(3.10d)

For convenience we have written here both X and Y as functions of Ω. It is not difficult to see that in this particular example S belongs to the class of LTI systems. Now the fact that the behaviour of S for a sinusoidal input signal can be described by a frequency response function H(Ω) suggests, that we can use this also for calculating the response to an arbitrary input signal x(t). For, according to Fourier, any x(t) can be regarded as a superposition of sinusoids, and an LTI system behaves additively with respect to such a superposition. In other words, between the Fourier transforms X(Ω) and Y(Ω) of our arbitrary

44

x(t) and the corresponding response y(t), we expect a similar relation as between the sine amplitudes X(Ω) and Y(Ω) in the previous example: Y(Ω) = H(Ω).X(Ω). Formally, we can interpret H(Ω) as the Fourier transform of some time function h(t): ∞ 1 H(Ω) = 2π h(t).e- iΩt dt .

∫

-∞ After some calculations we find then the following relation between x(t), y(t) and h(t): ∞ y(t) =

∫

x(t').h(t - t') dt' = x(t) * h(t),

(3.11)

-∞ which is called the convolution-product or convolution of x(t) and h(t). The significance of h(t) can be seen if we take as input signal the Dirac delta function: x(t) = δ(t). Then ∞ y(t) = δ(t').h(t - t') dt' = h(t), (3.12)

∫

-∞ where we have used the sifting property of the δ-function. For obvious reasons, h(t) is called the impulse response of the system S. Like its Fourier transform H(Ω), the impulse response h(t) provides a complete description of an LTI system. Notice that the convolution operator is commutative: i.e. x(t) * h(t) = h(t) * x(t). 3.11 Low-pass filters Consider the modulus (or amplitude) part A(Ω) of the frequency response function of a special kind of LTI system called low-pass (LP) filter. (A(Ω) is often called the filter's frequency characteristic). For an ideal LP filter (a 'cardinal' filter) one would expect that A(Ω) =1 for |Ω| < Ωc, and A(Ω) = 0 for |Ω| > Ωc where Ωc > 0 is the filter's cut-off frequency. In practice, such a filter cannot be realized from electrical components (resistors, capacitors, inductors) for various reasons. The most fundamental of these is the causality condition, which states that a cause cannot be preceded by its effect. This means for an LTI system that its impulse response function h(t) must be zero for t < 0. It can be shown that the impulse response of an ideal filter is necessarily symmetric around t = t0 (where t0 is the filter's delay time) and therefore cannot be causal. Examples of realistic low-pass filter frequency characteristics are shown in figure 3-5.

45

3.12 High-pass and band-pass filters A high-pass filter can in principle be obtained by constructing the complement of the output signal of a low-pass filter. A special example of a HP filter is the differentiator, for which A(Ω) = |Ω| (cf. the differentiation property of the Fourier transform, eq. 3.8e).

Figure 3.5 Examples of low-pass filter frequency characteristics. From [3.4]

Likewise, band-pass filters can be synthesized by combining LP and HP filters. This is almost never done this way, but we can see that it is not needed to formulate separate theories for the three classes of filters. For a symmetrical band-pass filter it can be shown by means of the modulation property of the Fourier transformation, that its impulse response is a 'wave packet' with an envelope similar to the corresponding LP filter.

46

3.13 Deconvolution In principle, filters can be designed which undo the effect of other filters. However, this process (called deconvolution or inverse filtering) is inherently unstable and cannot normally be realized without additional measures by the class of LTI filters discussed in this syllabus.

3.14 Non-linear systems The superposition principle, an essential feature of LTI systems, is not valid for non-linear systems like y(t) = x2(t). Such systems have the characteristic property that 'more frequencies leave than enter the system'. Often non-linearity is an unwanted side-effect due to the technical limitations of electronic components, like transistors. In audio electronics, for instance, nonlinearity leads to 'intermodulation distortion'. On the other hand, non-linear filters also have important 'constructive' applications. The median filter (a so-called order-statistic filter) is an important example.

3.15 An introduction to stochastic signals In the introduction to this chapter it was mentioned already that regarding a signal as a deterministic phenomenon is often not appropriate in practice. One reason is that really interesting, information conveying signals are never exactly predictable. In other words: such signals have a random or stochastic nature. A given or measured signal x(t) is regarded to be a realization of a stochastic process x(t). The statistical description is concerned with the ensemble of all possible realizations of the process, although in reality we have only one realization: x(t). The expectation E{x } of x(t1) (that is, at a certain instant t1) is defined by E{x } = µ =

∫

x.p(x).dx

Ξ1

where p(x) is the pdf (probability density function) of x at time t1 and Ξ1 is the set of all values that x can take at time t1 . The variance at t1 is the expectation of (x(t1) - µ )2 : E{ (x - µ )2 } = σ2 =

∫

(x-µ)2.p(x).dx

Ξ1

Of course, p(x) cannot be inferred from the set Ξ1 (because we know only one of its members). Therefore p(x) is an a priori known distribution which summarizes our knowledge of the process at time t1.

47

Often it is assumed for more or less plausible reasons that p(x) is 'normal' or Gaussian: p(x) =

(x-µ)2 1 .exp( ) σ 2π 2σ2

If our process x is stationary, then one single p(x) is valid for every instant t. If the process is ergodic as well, we may substitute time-averages for ensemble-averages: 1 µ = lim 2T T→∞

Τ

∫

x(t).dt

1 σ2 = lim 2T T→∞

Τ

∫

(x(t)-µ)2.dt

−Τ

−Τ

Ergodicity implies, roughly speaking, that every single realization of the process goes through all the values the process can possibly generate. Note that in the above expressions the explicit dependence on p(x) has disappeared! In what follows we will always assume that the process is stationary and ergodic. Moreover we will assume that the mean is µ = 0. Then σ2 is the mean power of the process x(t). The behaviour of a process at time t2 is never independent of what happens at a nearby instant t1 (independency is only found in the idealized 'white noise' process).The autocovariance function γxx(t2-t1) shows the similarity (more precisely: the linear dependence) between the process and its replica time-shifted over τ = t2 - t1. If the process is assumed to be stationary, γxx depends only on the time-distance τ, not on the absolute values of t1 and t2. Thus we can write γxx(τ) instead of γxx(t1-t2). If the process is stationary and ergodic, we can write γxx as time average over a single realization: 1 γxx (τ) = lim 2T T→∞

Τ

∫

x(t).x(t+τ).dt

(3.13)

−Τ

Notice that γxx (0) = σ2. Often the autocorrelation function ρ is used: ρ xx (τ) =

γxx(τ) σ2

(3.14)

It has the following properties: ρxx (0) = 1; ρxx (τ) = ρxx (-τ); |ρxx (τ)| ≤ 1. Besides the autocovariance function we have the cross-covariance function which tells about the similarity between a stochastic process and a time-shifted replica of a different stochastic process. Under similar assumptions, the cross-covariance function can be written as 1 γxy (τ) = lim 2T T→∞

Τ

∫

x(t).y(t+τ).dt

(3.15)

−Τ

48

3.16 Power spectrum In the frequency domain description of a stationary stochastic process, phase doesn't appear: obviously a shift of the time axis cannot make any difference in the properties of the process. Therefore the frequency domain description is restricted to the power spectrum P(Ω). Wiener and Khinchin have shown that P(Ω) is the Fourier transform of the autocovariance function: P(Ω) ↔ γxx (τ) (3.16) This relation can be seen as a definition of the power spectrum for stochastic signals.

Example: the 'random telegraph signal' An oversimplification of a process _y(t) producing Morse signals is based on the assumption that the timing of the on/off (0↔1) transitions can be described by a Poisson process. Then the probability of k transitions occurring in the time interval T is Prob(k,T) =

(aT)k -aT .e k!

where a is the average number of transitions per unit of time. It is not difficult to show that in this case the autocovariance function (acf) looks like 1

γ y y (τ) = 4 ( 1 + e-2a|τ| ) 1

By the transformation y(t) = x(t) - 2 we obtain a process _x(t) with zero average and acf 1

γx x (τ) = 4 e-2a|τ|

(3.17)

For 'technical' signals, exponential acf's are quite common. However for 'natural' signals (like electrocardiograms, ocean waves, noise in semiconductor devices, music) the acf tends to behave low-order polynomial. This means that correlation is present over many time scales. Such signals are sometimes termed 'chaotic' or 'fractal'. The power spectrum of _x(t) is the Fourier transform of γx x (τ): ∞ 1 a P(Ω) = 4 ∫ e-a|τ|.e-iΩτ dτ = . 4a2 + Ω 2 −∞

(3.18)

With a = 10 s-1 (representative for manual Morse telegraphy) the 3dB width is 3.2 Hz, the 40 dB width 318 Hz.

3.17 References and further reading 3.1 A. Papoulis: The Fourier Integral and its Applications, McGraw-Hill 1962. 3.2 A. Papoulis: Signal Analysis, MacGraw-Hill 1984. 3.3 D.C. Champeney: Fourier Transforms and their Physical Applications, Academic Press 1973. 3.4 A.V. Oppenheim, A.S. Willsky: Signals and Systems, Prentice-Hall 1983. 3.5 E.H. Dooijes, Syllabus Digitale Signaalverwerking, 1996.

49

4. Discrete-time signals and systems

4.1 Discrete-time signals A discrete-time signal is an ordered sequence of numbers {xn}. Each xn can be regarded as the value of a function x with integer argument n: xn = x[n]. (If we are dealing with functions of integer argument, we use square brackets for including the argument. Hence X(ω) and X[k] are different functions!). We use the phrase discrete time to make it clear that the independent variable is of discrete nature, not the function value x. Though the sequence {xn} does not necessarily result from sampling a continuous-time signal, we will often denote the individual members of the sequence as samples. Some special discrete-time functions are: the unit-impulse

0 for n ≠ 0 δ[n] = 1 for n = 0

the unit-step

0 for n

n=0

(a)

(b) n

Notice that δ[n] = u[n] - u[n-1], and u[n] =

∑ δ[m ] m =− ∞

Also important is the class of exponential functions x[n] = c.αn. Examples: α = 1, -1, 2, -2, 1/2, exp(iω). In the last case we can also write x[n] = cos(ωn) + i.sin(ωn).

50

n ->

x[n]

n->

(a)

α=1

x[n]

α

x[n]

α>1

(c)

n->

x[n] = cos ( 2π n /8)

x[n] = sin( 2π n /10)

n ->

n ->

(a)

(b)

Question A. Does a unique function x[n] exist for every ω ? Answer: no, because exp(iωn) = exp(i (ω + m.2π) n), m ∈ Z. Therefore we only have to pay attention to values of ω on an interval of length 2π, for instance (0,2π) or (-π,π). Question B. Is x[n] a periodic function? In that case, x[n] = x[n + N] would be true for all n and a certain value of N: exp(iωn) = exp(iω(n + N)) = exp(iωn).exp(iωN) The last expression equals the first one only if exp(iωN) = 1, and this is true only if ω k ωN = k.2π, or 2π = N , k ∈ Z. The answer to our question is therefore: yes, provided that ω/2π is a rational number. In that case the fundamental period of x[n] is N = k.2π/ω, provided that N and k have no factors in common. Now we can write down the set of all periodic exponential functions with period N as follows: 2π ϕk[n] = exp(iωkn) = exp(i.k. N .n), k ∈ Z The ωk's are multiples (harmonics) of the fundamental frequency 2π/N. However, ϕk = ϕk + m.N , m ∈ Z, which implies that there are only N different functions ϕk[n]. Therefore, we consider only those functions ϕk[n] indexed with k = 0, 1, ........, N-1.

51

For N = 8 this set is pictured in figure 4-1 below. Notice that each ϕk has both a real and an imaginary component. Im ϕk = sin (2πkn/8)

Re ϕk = cos (2πkn/8)

k=0

k=1

k=2

k=3

k=4

k=5

k=6

k=7

Figure 4.1: Basis functions for N = 8

52

4.2 Discrete Fourier Transform (DFT) Theorem: any function defined on N consecutive integers can be exactly written as x[n] =

N-1 X[k] ϕk[n]

(4.1a)

∑

k = 0

1 N-1 X[k] = N ∑ x[n] ϕk[-n]

(4.1b)

n = 0

with ϕk[n] = exp(i.k.(2π/N).n). This is the Discrete Fourier Transform (DFT), the discretetime analogue of the Fourier series expansion of a continuous-time function defined on a finite interval. An important difference is that, in the discrete case, an exact representation is obtained with a finite number of terms, whereas in the continuous case any finite number of terms provides only an approximation in the least-squares sense. The DFT is very important in practical applications; the Fast Fourier Transform is an efficient algorithm (invented by Cooley and Tukey in 1965) for computing the X[k] from the x[n] and vice versa. 4.3 Discrete-time Fourier transformation (DTFT) In a similar way as we arrived at the continuous-time Fourier transformation (CTFT), we can formulate the Fourier transformation for an infinite discrete-time sequence x[n]: 1 π x[n] = 2π ∫ X(ω)e+iωn dω , -∞< n 1. This means that a certain amount of correlation exists between subsequent data values. The entropy of the first value of our data sequence would be 8 bits (all 256 values being equally probable). However, the entropy of each next symbol is only 1.5 bits. So by encoding the difference, we shrink from 8 bits to 1.5 bits. This example also illustrates the danger of decorrelation: if one difference is in error then all subsequent reconstructed data values will contain this error. For this reason decorrelation is always applied over relatively short blocks of data. Each block starts afresh with a full representation of the corresponding data value. In this way the possibility of error propagation can be reduced. 5.2 The capacity of a transmission channel In chapter 4 we have shown that a band-limited signal, ranging in the frequency domain from 0 - W Hz can be exactly represented by 2W samples per second. Thus we can say that the number of degrees of freedom (d.o.f.) of a signal segment of duration T seconds is 2WT.

62

Notice that every d.o.f. should be associated with a continuously variable quantity. So it is not possible to derive from this how many bits of information C can be conveyed per second by this signal in practical cases; C would be an infinite number if each d.o.f. could take any value! Even the practical fact that the total power (energy per second) P of the signal should be restricted to some upper bound does not change this observation. To obtain a useful estimation of the capacity C we have to compare the signal power P to the power N of the noise which is unavoidably added tot the signal while it propagates through its channel. The following expression for the channel capacity is a key result of Shannon's information theory : C = W 2log

P + N N

(5.6)

under the assumption that we have thermal noise (i.e., white and Gaussian). Notice that P + N is the total power of the received signal. The derivation of this expression is based on the result [5.3] that the entropy of band- and time-limited thermal noise of power N is 2 log 2πeN per d.o.f. Because there are 2WT degrees of freedom, the entropy per second is W 2 log2πeN. The entropy of the received signal is maximal if the source signal is itself thermal noise. Since the capacity of the channel is defined as the maximal difference between received signal entropy and noise entropy we have C = W 2 log2πe(P + N) - W 2 log2πeN = W 2 log

P + N . N

For example, if we have a voice-quality telephone line, W = 2400 Hz with a signal-to-noise ratio SNR = 35 dB5. Then 1+ P/N = 3151, and the capacity is 2400 * 11.7 = 28000 bits per second - which is the maximal capacity of today's dial-up lines. Actually the SNR of 35 dB was obtained here by calculating backwards from the known maximum bit rate. In reality the SNR should be better than 35 dB to achieve this, because the above conditions for maximal capacity are seldomly satisfied. Evidently, these considerations don't tell us anything about the way we could in practice push this amount of bits through the line! In section 5.6 we will briefly treat a few ways to accomplish this. 5.3 Error-detecting and error-correcting codes In the previous section it was shown that a transmission channel (a general term for any medium where data are stored temporarely) has a limited capacity because of the unavoidable occurence of noise. This means, of course, that we can never have the guarantee that a stream of bits entering the channel will leave it unchanged: errors are always to be expected. For this reason coding systems have been developed that add redundancy to data instead of removing it. In practice we find always combinations of the two approaches: usually first gross redundancy 5

Notice that the number of decibels corresponding to a power (not amplitude) ratio P1/P2 is 10.10log(P1/P2).

63

is removed by the techniques discussed in the next chapter; then on a smaller scale (for instance for every group of four bits as will be discussed below) redundancy is introduced again. We now give a number of ways this is often done in practice. 5.3.1 Parity checking In simple short-distance transmission (for instance by the RS-232 protocol), to each group of seven bits representing an ASCII character another bit is added in such away (i.e. by adding either a 0 or a 1 bit) that the total number of 1 bits is always even6 . If on the receiving station an odd-parity byte is detected, one can be sure that an error has occurred. However it is not possible to deduce exactly which data bit was in error. One way to make this possible is the '4 out of 7' code, discussed in the next section. 5.3.2 Error correcting codes As an example of error correcting codes we discuss the (7,4) Hamming code [5.1]. We split the bit stream 0000100010100100 into groups of four bits: 0000 1000 1010 0100 and complete each group by three bits, b5, b6, b7 computed as follows: b5 = b2 + b3 + b4 mod 2; b6 = b1 + b3 + b4 mod 2; b7 = b1 + b2 + b4 mod 2. Thus our coded bitstream is 0000000 1000011 1010101 0100101. It is assumed that there is at most one bit in error in each group of 7. This assumption is usually justified - in dial-up lines the error rate is of order 1 in 105, in high-quality lines less than 1 in 1010. In the case of no error we should have b2+b3+b4+b5 = even; b1+b3+b4+b6 = even; b1+b2+b4+b7 = even whatever the data bits b1, b2 and b3 may be. Three parity checks must be performed by the receiver. If all three sums are found to be even there is no error; if just one sum is odd, the error is in the check bit in that sum. If exactly two sums fail, the error is in the data bit which is common to them, but not in the third. If all three sums fail, the error must be in b4. A more involved but related error checking technique is CRC (cyclic redundancy check). CRC provides better safety against error bursts. Error correcting codes are extensively used not only in data transmission systems, but also in computer memory, data storage media and in the audio compact disc. See for instance [5.4] for more detailed information.

6

In fact, most systems leave the freedom to select either even or odd parity checking, or to omit the parity checking.

64

5.4 Transmission of analog signals: amplitude modulation Amplitude modulation (AM) was for many years the most important technique for imposing audio signals on high-frequency (100 kHz - 150 GHz) sinusoidal carrier signals, in order to transmit them by radio. Exactly the same technique is used for transmitting many signals (for instance telephone-channels) simultaneously over a cable. The principle is simply to vary the amplitude A of the carrier as a function of the audio signal a(t): s(t) = S cos Ωct , with S = S0(1 + m.a(t)), S0 > 0, where the modulation depth parameter m is chosen so that m.a(t) doesn't exceed 1 in absolute value, thus ensuring that S never becomes negative. For convenience we suppose that a(t) is a simple sinusoidal tone: a(t) = A cos Ωmt. Then m must be chosen so that m.A

E.H. Dooijes F.C.A. Groen

Department of Computer Science University of Amsterdam

Sixth Edition, January 1999 / October 2006

Table of contents

1

Introduction

3

2

Sensor properties

8

3

Continuous-time signals and systems

35

4

Discrete-time signals

50

1.1 1.2 1.3 1.4 1.5 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8

Aims and scope Unstructured versus structured data Relation between the real world and the world model Data exploration and model building Applications of sensor informatics Conversion of physical quantities into computer readable form Sensor properties Uncertainty in the sensor value Sensor types Image sensors Mechanical signals Standards for measurement systems Virtual sensors References Least-squares approximation of a function Orthogonal functions Fourier series expansion of a function Energy and power; power spectrum Example: an impulse function The Fourier transformation The uncertainty principle Some properties of the Fourier transform Linear time invariant systems Example: low-pass filter Low-pass filters in general High-pass and band-pass filters Deconvolution Non-linear systems An introduction to stochastic signals Power spectrum References and further reading Discrete-time signals Discrete Fourier transform (DFT) Discrete-time Fourier transform (DTFT) Sampling continuous-time signals; Nyquist's theorem Reconstruction of continuous-time signals Discrete LTI systems Stochastic discrete-time signals References

1

5

Information and communication

60

6

Compression of sensor data

71

7

Structuring by modeling

84

8

Structuring by interpretation

96

9

Control systems

107

5.1 5.2 5.3 5.4 5.5 5.6 5.7 6.1 6.2 6.3 6.4 6.5 6.6 7.1 7.2 7.3 7.4 7.5 8.1 8.2 8.3 8.4 8.5 8.6 9.1 9.2 9.3 9.4 9.5

Entropy The capacity of a transmission channel Error-detecting and error-correcting codes Transmission of analog signals: amplitude modulation Transmission of analog signals: frequency modulation Transmission of digital signals References Entropy encoding LZW coding JPEG image coding Coding of image sequences Run encoding References Modeling of sensor data Estimation of model parameters Stochastic models Modeling of speech References Feature space Probability density function of the classes Bayes' rule Nearest neighbour method Evaluation of classification results Relation between number of learning samples and features Introduction A robot controller The z-transformation Poles and zeros A discrete-time feedback system

Appendix A: An introduction to complex numbers

113

Appendix B: Some basic statistics

116

Appendix C: Foutenvoortplanting (in Dutch)

121

In the 6th edition of this Sensor Informatics syllabus a number of errors (mainly typos) from the previous edition are rectified. Chapter 9 is new. EHD - December 1998. In the current pdf version the pages were renumbered, and two or three minor errors were corrected. EHD, October 2006.

2

1. Introduction

Sensor informatics deals with input and output of information systems from and to the real world. Traditionally it is the human observer who is the interface between the real world and the information system. Information systems that sense the outside world without the direct intervention of a human observer will become increasingly important. The techniques applied here can also be used when the sensor signals do not originate from physical quantities, but are stock-market notations or dollar exchange rates. This kind of data can also be be seen as valid sensor data. The present course will survey this field and provides an introduction to courses in computer vision, digital signal processing, coding, and pattern recognition. We will give a survey of the different aspects of sensor informatics and focus on the coherence between the different fields involved. 1.1 Aims and scope In many traditional applications, input to an information system is given by striking keys on a keyboard. This input is basically textual information. Interaction devices such as a mouse gives the possibility to supply graphical data to the information system, and also to interact with pictorial information. As such, this kind of interaction gives a new dimension to working with an information system on pictorial data; which is essential for e.g. CAD/CAM systems. Using an interaction device to choose from a menu or select items from dialogue boxes makes working with an information system more easy, but does not create essentially new possibilities. Input obtained from sensors will play an increasingly important role in information systems of the future. In multi-media information systems, for instance, sound and pictorial information are typical examples of data obtained by sensing devices (microphones, video cameras), which is virtually impossible to enter through the keyboard or an interaction device. Applications in which the sensor information is used to interact or control the environment are traditionally studied in system theory and control theory. In many technical information systems the acquisition and processing of sensor information is needed to control a process. A simple example is a central heating controller, which switches on the heater when the temperature is below the desired level, and switches it off when the temperature is too high. In modern computerized heater controllers the measured temperature is compared to the heating

3

profile and this heating profile is adjusted for the next day to minimize energy consumption. We are almost not aware of many other small digital control systems, like those used in modern tv sets to control the tuner-frequency and the optimal adjustment of the set. Complex examples are the automatic pilot in airplanes, process control in the chemical industry, and many more. In general, the field of control theory is directed to the automatic control of processes in which real-time processing is essential. Telecommunication and coding is another important field with which we will be involved. More than often (in particular, in image handling) one has to deal with large quantities of data. Data has to be transported, stored, and retrieved efficiently. This requires coding of the data so as to remove any redundancy. In many cases, one can go even further because a certain loss of information can be tolerated in the coding process. Very efficient coding can be obtained by interpretation of the sensor data. For example sending textual information as bit patterns by fax costs much more bytes than ASCII coding the characters (one byte per character); even if information about font and position are also transmitted in the latter case. Text recognition (often called OCR, optical character recognition), a special application of image recognition, is needed to achieve this. In particular in telecommunication, coding is used on the contrary to add redundancy to the data in order to detect or compensate for information losses during transport or storage. 1.2 Unstructured versus structured data Sensor information always gives only a limited view on the environment. Only that part of the physical reality is perceived for which sensors are connected to the system. When for instance a microphone and a video camera are attached to the system, sound signals and images are obtained; but further, the system is blind for other possible information from the environment. Compare this to the richness of human sensing. Beside hearing and vision, even the human skin is capable of 'measuring' a number of quantities. The most obvious of them is pressure; related quantities are sheer and slip. Temperature and hence heat conduction and humidity are felt. Air flow is perceived through motion of hairs and cooling by forced evaporation. For all these quantities perception stops after some time: the skin is only sensitive to changes. Hence, for corresponding sensors dynamic properties must also be specified. Thus one has permanent, transient and periodic versions of a quantity. Secondly, sensor information is inherently unstructured. We may record images on a photo-CD and play them back in an arbitrary order, but we cannot recall them automatically based upon their contents: the images are known to the system by their record number not by their contents. Automatic retrieval based on the image data contents requires image data interpretation. Data structuring can be done by the computer if it is able to interpret the data; otherwise the structuring has to be done by a human observer.

4

Automatic recognition of objects present in images, recognition of speech, speakers or music in signals is a well-known hard problem which at the current state of art can only be solved within a limited context. The reason why it is difficult has to do with the large variability possible in the appearance of, for instance, a 3D object. Depending upon the direction of the illumination, the viewing angle, and how the object is placed in a 3D world, completely different images may be the result. A human is extremely good in this recognition task. Interpretation of an arbitrary video scene, which bears only the 2D projection information is no problem at all for a human being. This may be interpreted as a proof of existence that the information present in a video signal is sufficient for automatic understanding. However, we do not know how a human observer is capable of doing it. So, for us dealing with sensor data processing in information systems, automatic interpretation of an arbitrary scene is impossible for the time being. Only for very specific problems the situation is different, for example: interpretation of electronics diagrams, character recognition of machine written text (OCR), and speech recognition with a limited vocabulary and a limited number of speakers. Often human interaction is still needed to verify the result, or to resolve detected errors. So the major problems (and research items) are not in data acquisition but in the interpretation of the sensor data. 1.3 Relation between the real world and the world model As discussed in the previous section, sensor data is inherently unstructured. Automatic interpretation is possible only if we have some general knowledge about the sensor data. This knowledge can be represented by a model, which defines the context and what kind of information has to be extracted from the sensor data. When dealing with pages of written text, the appropriate model will describe the text in terms of fonts, sizes, and lay-out; and only text will be recognized. When figures or photographs are present in the text, these will be discarded since they do not fit in the model. This world model is an abstraction from the real world, but sufficient for the purpose which the system was developed for. Another example is a geographical information system. Existing maps are updated based upon remote sensing images. The world model consists in such a case of the positions of roads, waterways, railway tracks, and the borders between the different regions. No other information present in the remote sensing images can be brought in because the world model cannot accomodate it. For the guidance of a mobile system to avoid collisions, a map of the surroundings is needed to demarcate forbidden regions where obstacles are present. Often, many considerably different representations are possible for the same world model. Which one is most suitable depends upon the operations which have to take place upon the data. Clever choices here can make dramatic improvements in the performance of a system.

5

1.4 Data exploration and model building Often the world model is not known a priori, but has to be discovered from patterns and trends present in the sensor data. Besides statistical analysis and the like, visualisation is an important tool for exploring data. Visualisation should appeal to the human observer's unsurpassed ability to recognize patterns in the images he perceives in his daily life. Visualisation of signals, twodimensional images, and projections of three-dimensional data is more or less straightforward. Interaction devices give the possibility to interact with the visualised data, to change grey-tones, colour palettes or the viewing angle. Animation gives the possibility to emphasize changes with time. The situation becomes much more difficult when we have to deal with data that has an inherent higher dimensionality than three. Such a situation occurs, for example, when we measure more than three features of objects: each object is represented by a point in a highdimensional feature space. A simple illustration is the measurement of length and height of vehicles. When we plot along the horizontal axis the length and along the vertical axis the height, each vehicle is represented by one point in this 2 dimensional space. The points in the space will cluster for cars and trucks. Different techniques exist to explore these higher dimensional spaces, like projecting the points to a subspace. 1.5 Applications of sensor informatics Applications of sensor informatics can be divided into three mainstreams: • storage, retrieval and transportation of sensor data; • decision support based upon sensor data; • interaction with the environment based upon sensor data. The possibilities of modern computer equipment with regard to processing speed and memory capacity have made it feasible to manage large sensor-databases. A major application is the storage and retrieval of images or signals. Examples are geographic information systems, medical image databases, hospital information systems, museum databases, and document image databases. Often, besides storage and retrieval, processing of the sensor data is needed. Processing is needed, for instance, to enhance images, to compare images with existing abstract descriptions of the image content such as maps, or to align images with each other (registration). A field of large interest is indexing images based upon pictorial information. Most applications require specific techniques for measuring certain features of an image: in cardiologic images, measurement of the size or distribution of blood-vessel structures is important; in forensic databases, images are classified according to certain features of fingerprints. A second family of applications is decision support systems using sensor data. Pattern recognition and feature extraction techniques are important here. Well-known applications are optical character recognition and the automatic reading of postal checks. Also, systems for 6

medical diagnosis and surveillance are examples in this category. An area of increasing importance is verification for fraud prevention and admittance. Issues in this context are speaker and writer verification. Industrial examples are the detection of exceptional states of machines, chemical processes or nuclear reactors. In the third category of applications, the interpreted sensor data is used to control the environment by means of actuators. Examples are mail sorting systems based upon the postal code, mobile vehicles avoiding obstacles, or active robot vision systems focussing on moving targets. When the result of the action is perceived in turn by the sensing system, a closed loop is created. Such a loop can be used to control and stabilize a process, but closed-loop systems can become unstable. Instability is related to delays and amplification factors within the loop, so it is important to investigate how the feedback in the control loop is realized. Sometimes a feedback loop is created unintentionally. An example is the buying and selling behaviour of investors using the same investment advising program.

7

2. Sensor properties

To enable an information system to interact with the real world, we need to connect measuring instruments which observe the world and feed data into the system. In this chapter we will discuss various properties of measuring instruments and the way they can be put together. As sensing involves physics, this chapter will have a certain physical flavour. 2.1 Conversion of physical quantities into computer readable form A basic block diagram of a measuring instrument connected to an information system is sketched in figure 2-1. The first component is a sensor to convert the physical quantity we are interested in into an electrical signal. For instance, for sound we need a microphone to convert variations in air pressure into an electrical signal. For images we may use a video camera to obtain a signal representing the brightness in the image when it is scanned line by line.

physical quantity

sensor

Analog-to Digital Convertor

format

to bus

format

from bus

memory

sampling generation

Figure 2-1 Basic model of a measuring instrument

The next block represents the conversion of the electrical signal into digital numbers. This is realized by an Analog-to-Digital Convertor (ADC). The input range of the ADC is divided into a fairly large number of intervals of equal size ∆v. The successive intervals are numbered to represent the quantized input. Thus, when the quantized signal has the integer value k, the corresponding value of the original signal was in the interval between vk and vk+1: vk ≤ v < vk+1 with vk = k.∆v.

8

This process is illustrated in figure 2-2 for 8 quantization intervals. The number of quantization levels is in general a power of 2. When we have n bits available the number of quantization levels is 2n. For example, when the number of bits n = 8 there are 256 intervals, and the resolution is said to be 1/256 (of full scale)1.

v(t)

∆v

7 6 5 4 3 2 1 0 1

2

3

4

5

6

t

Figure 2-2 Quantization process of a 3 bit ADC with 8 quantization levels. The successive quantized values of v for t =1 through 6 are: 1,3,5,6,5,4.

An important decision to be made is the number of quantization levels (so the number of bits) needed to represent the continuous signal. This number should be related to the noise (inaccuracy) present in the sensor signal: the inaccuracy introduced by the quantization process should be considerably smaller than the inaccuracy in the sensor signal itself. We will discuss this topic in section 2.3. Another important issue is the question how frequently we should sample the continuous signal, as we can store only discrete events in the computer. In chapter 4 the Nyquist sampling theorem will be discussed; we will see that the sampling rate should be at least twice the maximum frequency present in the signal. In that case, the analog signal can be completely recovered from the sampled values. In fig. 2-1, the 'sampling generation' block takes care of the sampling. The sampling process can be quite complicated in video systems, where it has to be synchronized with the line-by-line scanning of the camera (section 2.5). The quantized signal is formatted and sent over a bus system to the computer. Sometimes blocks of data are stored temporarily in the memory of the measuring instrument. Relatively simple measuring systems can be put together easily using off-the-shelf equipment. The selection of the sensor depends strongly upon the application. There is a wide variety of different sensors for all kinds of physical quantities. Programmable equipment to sample an electrical signal and read it into the computer is available from many vendors. Such equipment can be directly interfaced to the computer system through a standard bus. 1In

this syllabus we adhere to the convention that the better or higher resolution is expressed by a smaller quantity. Thus 'resolution' can be associated with the smallest detectable difference in the quantity being measured.

9

Top to bottom: schematic representations of a digital-analog converter (DAC), an analog-digital converter (ADC) based on the principle of successive approximation, and a 'flash' ADC.

10

2.2 Sensor properties Let x be a physical quantity that we want to measure, and y the voltage output of the sensor. We would like to have a linear relation between the output of the sensor y and x: y = S. (x - x0 ) (2.1) The following properties of a sensor can now be defined. Sensitivity: The factor S is called the sensitivity of the sensor. An equivalent definition is S = dy / dx (2.2) Zero-point: x0 is called the zero-point. It is the value of the sensor input for which the output y is zero. For example, the zero-point of a temperature sensor is the temperature for which the output voltage is zero. Offset: this is the output value of the sensor when the input is zero. So for example the offset of a light sensor is the output voltage when there is no light. For a linear sensor, the offset is obviously related to the zero-point (substitute x = 0 in 2.1): yoffset = - S . x0 (2.3)

y

dy xmin xo

dx

offset

xmax

x

Figure 2.3 Illustration of sensor properties: sensitivity S = dy / dx, zero-point and offset

Another sensor property is its measurement range, given by the minimum and maximum value of the physical quantity (xmin and xmax) for which there is a meaningful output of the sensor. Accuracy: the accuracy is the uncertainty ∆x in the sensor value. Often it is given as a fraction of the measurement range : ∆x / (xmax - xmin). When the accuracy ∆x increases with the value of x, it is customary to specify it as a relative accuracy : ∆x / x (in this situation, we usually have xmin = 0). Non-linearity: the maximum deviation from the linear relation (2.1) is the non-linearity, mostly expressed as a fraction of the measurement range.

11

We have mentioned the desirability of a linear relationship between the physical quantity and the sensor output. This was certainly true in the past when there were no computerized measuring instruments. Today, however, a linear relation makes life easy but is not strictly needed. As long as there is a monotonic functional relation f(x) between y and x over the measurement range, we can use the sensor. This relation must be known and the inverse x = f-1(y) must be stored in a table in the measuring instrument or in the computing system. For each measured output y we can look up what the corresponding value x would have been. Evidently, in the case of a non-linear relationship the sensitivity varies over the measurement range. Input range: this is the range of input values for which the sensor operates according to the specifications. For input values beyond this range linearity may no longer be guaranteed, for instance. Input limit(s): If the input value exceeds these limits the sensor will probably be damaged. 2.3 Uncertainty in the sensor value When we measure repeatedly the output of a sensor (transducer) under the same conditions, the result will never be exactly the same. Small variations are present in the sensor signal called noise. Noise finds its origin in the physical properties of the sensor (thermal fluctuations and quantum effects), or results from external disturbances. The existence of noise is responsible for the fact that there is a basic and unavoidable uncertainty in the result of any measurement. Let us consider a sensor with linear response, i.e. a relationship y = Sx exists between the output value y and the input value x (whether or not there is an offset is not relevant in the present discussion). Now, if the sensor adds noise with standard deviation σonoise to its output signal , then it seems as if we are looking with a noise-free sensor at an input signal containing noise with standard deviation σinoise = σonoise / S (see Appendix B). Let xmin and xmax be the limits of the sensor's input range. The sensor can only discriminate between input signal values differing by an amount of order ∆x = σinoise. The sensor's dynamic range R, expressed in decibels, is the logarithm of the number of virtual accuracy steps ∆x corresponding to the input range xmax -xmin: xmax - xmin R = 20 10log dB with ∆x = σinoise. (2.4) ∆x A similar quantity can be defined for a signal s containing noise. If we compare the standard deviation σnoise of the noise in the signal to the signal range smax -smin, we obtain the signal-tonoise ratio (S/N) as S/N = 20 10log

smax -smin σnoise

dB.

Consider now the output signal of our noisy sensor while sensing a noise-free signal. It seems as if we are looking with a noise-free sensor to an input signal with effective signal-to-noise ratio 12

smax -smin dB. σinoise Evidently, this signal-to-noise ratio is maximal when smax -smin is as large as possible, i.e. when the signal's amplitude matches the sensor's input range xmax -xmin. In this case, the signal-toS/N = 20 10log

noise ratio of the sensor's output signal y equals S/N = 20 10log

xmax -xmin σinoise

= 20 10log

ymax -ymin σonoise

dB.

In order to translate the sensor's output voltage y into a sequence of digital numbers, suitable for being processed in a computer, an analog-to-digital converter (ADC) is used (section 2.1) which on fixed time intervals produces a number representing the quantization level which best approximates the current y value. An ADC introduces so-called quantization noise whose effective standard deviation σqnoise is (Appendix B): Δ σqnoise = √12

with

Δ=

ymax - ymin 2n

(2.5)

Here Δ is the discretization interval; n is the number of bits used by the ADC; ymin, ymax delimit the ADC's input range, which is taken here to coincide with the sensor's output range2. In table 2.1 S/N values for n-bits A/D converters are given. 2n

ymax -ymin σqnoise

S/N ratio in dB

2 4 8 16 32 64 128 256

7 14 28 55 111 222 443 887

17 23 29 35 41 47 53 59

Table 2.1. S/N ratio for quantization noise

Evidently we want the quantization noise to be at most of the same order of magnitude as the sensor output noise. Thus, to estimate the appropriate number of bits for the A/D converter, we put σqnoise < σonoise.

2This

is the ideal situation. In practice it can often only be attained by adapting the sensor to the ADC using an instrumentation amplifier, which also will contribute to the overall noise level of the sensor signal.

13

Example: A video camera, a brightness (b) to voltage (v) transducer, has according to the manufacturer's specifications a 50 dB SNR under good conditions. Hence 50 = 20 10log

vmax - vmin σonoise ,

vmax or σonoise = 316

because vmin = 0 in this case (zero offset). For digitizing the video signal, we choose the effective discretization step size such that Δ vmax 2 √12 < σonoise, hence Δ < √12.σonoise = 91 , and n > log91 ≥ 7 bits. By choosing 8 bits, a common practice in commercial video digitizers, the quantization noise is a factor two smaller than the noise in the video signal. Notice that we have assumed throughout the calculation that all signals are properly matched: thus - possibly by using an automatic iris - we fit the CCD camera to the given lighting conditions, and by using an amplifier with properly chosen amplification factor we take care that the sensor's output voltage range matches the ADC's input range.

2.4 Sensor types

Radiant signals Mechanical signals Thermal signals Electrical signals Magnetic signals Chemical signals

light intensity, wavelength, polarization, phase, reflectance, transmittance position, distance, velocity, acceleration, force, torque sound pressure temperature, specific heat, heat flow voltage, current, charge, resistance, inductance, capacitance, dielectric constant, electric polarization, frequency, pulse duration field intensity, flux density, moment, magnetization, permeability composition, concentration, reaction rate, toxicity, pH

Table 2.2 Physical properties in the signal domains after Middelhoek et al.[2.2]

There are sensors for measuring all kinds of physical entities. There are sensors for distance, small displacements, temperature, forces and torques, velocity, acceleration, flow and all kinds of radiation. Obviously, which sensors are used depends strongly upon the application. For a multi-media information system we need the possibility to input sound and images, so a microphone and a scanner or a video camera are needed. For a system to monitor air pollution and smog formation, the measurement of particles and certain gases in the air together with the weather conditions are important.

14

A sensor has to transform a signal from the outside world into an electrical signal. Following Lion [2.1] six different domains can be distinguished in these signals from the outside world: radiant signals, mechanical signals, thermal signals, electrical signals, magnetic signals and chemical signals. Physical properties of importance in these different signal domains are listed in table 2.2. For any sensor, the conversion from one signal domain to another is based on one of the many existing physical and chemical effects and measurement principles that have been developed. Because of the immense number of measuring principles and devices, reviews of this field often have an encyclopedic character. Signals are carried by some form of energy. Sensors that transform this incoming energy into the electrical energy of the sensor output are called self-generating or active sensors. No additional source of energy is needed to obtain the measured sensor signal. Examples are a solar cell, converting light energy into an electrical signal for measuring illumin-ation, or a piezo-electric microphone converting the mechanical energy of acoustical waves into an electrical signal. When an additional energy source is needed for the operation of the sensor, we call the sensor a modulating or passive sensor. The energy source is modulated by the measured quantity. Examples are: a linear potentiometer (variable resistor) used for measuring translations; an angular position decoder, which counts the number of holes in a rotating disc by interrupting a light beam; a Hall-effect sensor measuring a magnetic field, in which case a current source is modulated by the magnetic field. A sensor has in general a spatial resolution: it measures a certain physical quantity at a certain location. When we measure the temperature, we do so at (or around) the place where the sensor is. As to their spatial extension, point, line and area sensors can be distinguished, which produce a single value, a profile or an image of the measured quantity. It is usually important that sensors be robust, small and low-cost. With the development of low-cost microelectronics devices, new sensors can open new markets. Examples are sensors for the detection of the quality of food, for the consumption of gas, electricity; for the continuous inspection of correct operation of all kinds of system, such as street illumination; for the identification of persons and goods. In sensor technology, the material and the measurement principle used play an important role. Particularly the development of solid-state sensors based upon silicon are promising (Middelhoek et al.[2.2]). The use of silicon not only makes it possible to apply the welldeveloped production methods of integrated circuits to sensor production, but also makes it feasible to combine the sensing and the processing of the sensor signal on a single chip. This gives the possibility to improve the characteristics of a sensor at a much lower price and with better performance than which discrete components. Sensors combining sensing and processing are often called 'smart sensors' . Conversion of an electrical signal into one of the other signal forms from table 2.2 takes place at the actuator or output-transducer side. Examples are a display tube, in which a 15

conversion to radiant energy takes place, or a loudspeaker which transforms the electrical signal to mechanical energy of acoustic waves. Output transducers have undergone continuous development as well. Here too is a demand for robust and low-cost devices, as actuators represent the major cost factor in most systems. Sensor principles We will now briefly review the different principles to create sensors for the five non-electrical signal domains, and discuss then in more depth three important cases: from the radiant domain: the image sensors and from the mechanical domain : the measurement of position and sound. Radiant signals Electromagnetic radiation includes besides the visible (infrared and ultra violet) light also radio waves (including 'microwaves'), X-rays and gamma rays. They differ in wavelength, ranging from 104 m for long radio waves to 10-14 m for gamma rays. The wavelength of visible light is between 400 nm (violet) and 700 nm (red). A different form of radiation is nuclear-particle radiation, which includes alpha, beta, and other particles. In this text we will concentrate on visible light. Solid-state sensors for (visible) light are mainly based on the photoelectric effect, that converts light particles (photons) into electrical charge. The absorption of photons by the lattice of the sensor material (mostly silicon) creates electron-hole pairs, which upon collection realize the transformation of radiant energy into electrical energy. Examples of these sensors are photoconductors, photodiodes and phototransistors. Mechanical signals There is an important difference between sensors that measure position with or without mechanical contact with the real world. The measurement of positions in images using image processing techniques has given the possibility to measure positions remotely. Various physical principles are exploited for measuring position or proximity including inductive, capacitive, resistive and optical techniques. Force and pressure cannot be measured directly. First a force or pressure has to be converted to a displacement, and the displacement can be measured with one of the techniques described above. Thermal signals The resistance of a metal or a semi-conductor depends upon temperature. This relation is wellknown and is exploited for temperature sensing. Also the base-emitter voltage of a bipolar transistor is temperature dependent, and is used in many commercially available low-cost temperature sensors.

16

Self-generating temperature sensors can be obtained using the Seebeck-effect. When two wires made from different metals are welded together at one point , and this junction point is heated or cooled with respect to the remaining parts of the so-called thermo-couple, a voltage is present between the open ends. For small temperature differences, the voltage is proportional to the temperature difference. Magnetic signals Most of the low-cost magnetic sensors are based on the Hall effect. When a magnetic field is applied to a conductor, in which a current flows, a voltage difference over the conductor results in a direction perpendicular to the current and the magnetic field. Because this effect is quite substantial in semi-conductors, semi-conductor Hall-plates are low-cost and used in many commercially devices. Many materials change their resistivity upon application of a magnetic field. This so-called magneto-resistance effect can be exploited also for building magnetic sensors. Electrical signals Many phenomena interesting for being processed by an information system are electrical by nature: for instance biomedical signals (EEG, ECG); radio signals and many more. The equipment needed to convert, for instance, the tiny potential differences occurring at EEG electrodes to larger voltages, at a different impedance level, without interference from surrounding electromeagnetic fields, while avoiding the risk of the patient's electrocution, forms a special category of sensors. Chemical signals For monitoring the environment, the measurement of specific components within gas mixtures is necessary. This motivated strongly research into miniature low-cost (and possibly disposable) chemical sensors. The chemical signal can be directly converted to an electrical signal or first converted into an optical, mechanical or thermal signal, which is then converted into an electrical signal. As an example, a sensor can be built for measuring the CO concentration in air, by determining the attenuation of an infrared beam. As CO absorbs IR light, the attenuation is a measure for the concentration. Many chemical sensors are based on the measurement of the change of the conductivity or the dielectric constant of a chemical when it is exposed to a gas or electrolyte. Such a material can be a metal oxide. For instance, the electrical conductivity of tin dioxide changes with the concentration of methane when heated. In this way a sensor for the presence of gas can be built. Also many organic materials, when exposed to a gas, change their conductivity. However, since the conductivity of these materials is very low, they are hard to use. Chemical sensors exists for the measurement of many gases such as carbon monoxide (CO), carbon dioxide (CO2), oxygen 17

(O2) and ozone (O3). Also sensors for the humidity and acidity (pH) belong to this type. A disadvantage of most chemical sensors is that they are not only sensitive to one chemical measurand but usually respond to many, which makes it necessary to use these sensors under well-defined conditions. An important class of chemical sensors are biosensors. One type of biosensor is the acoustic biosensor [2.4]. In such a sensor a vibrating quartz-crystal is is coated with with a biochemical, which is specific for a the matter to be detected. When this coating, such as an antibody, binds to the matter to be measured, such as an antigen, the mass of the coating increases. This leads to a change in the resonance-frequency of the quartz-crystal, as the resonance frequency is directly related to the mass. The frequency change can be measured accurately. In this way a sensitive and specific sensor can be realized.

2.5 Image sensors The requirements with respect to absolute accuracy are for image sensors in general less than for measurement sensors. The information is however much more complex. Image sensors are built as an array of brightness sensors, which are electronically scanned. The spatial extension of the elementary sensors should be small, so as to avoid overlap. In fact there are three places in an imaging system where scanning can take place: in the illumination, the object / sensor positioning and in the sensor itself. A scene can be scanned with a single light beam while the reflected light is measured using a single elementary brightness sensor. This technique is applied in laser scanners, to obtain both a brightness image and an image representing the distance to objects. This may involve slow mechanical scanning and is in general expensive. Scanning can also be obtained by moving the sensor over the scene. This method is applied in flat-bed scanners, where a line image sensor is combined with onedimensional mechanical motion to access successive lines of the image. In an array image sensor the two-dimensional scanning process is completely electronic. A good design of a vision system involves an optimal choice of illumination, optical system and sensor in relation to the material properties of the object or the scene to be measured [2.5, 2.6, 2.7]. When human perception is involved, the fact that physical measured wavelength and perceived colour are quite different things should be taken into account. We will now briefly discuss colour perception and then in some more detail video systems because these, together with flat-bed scanners, are the most frequently used image input devices for information systems.

18

2.5.1 Colour perception In general we have to do with light, which is composed of a mixture of different wavelengths. The question is how we perceive such a mixture. We can imagine that the human eye has three types of colour receptors, all with their own spectral response. One type has its maximum sensitivity for red light, one for blue and one for green. Depending upon the spectral composition and intensity of the incoming light, these three receptors are stimulated, leading to the perception of a certain colour. Psychophysical evidence shows that every perceivable colour may be generated by combining red, green and blue lights. However these combinations are not unique for a given colour. Spectral compositions giving rise to the same colour are called metameres. Hence for reproducing a perceived colour, it is not necessary to reproduce the original mixture. It is sufficient that the receptors are stimulated by a metamere of the original mixture.

2.5.2 Video norms Video systems originate from the entertainment industry. This market has set the standards for video systems: the American EIA norm and the European CCIR norm. In the EIA norm a video image consists of 525 lines, with 30 image-frames/second. In the CCIR norm a video image consists of 625 lines, with 25 image-frames/second. To prevent flickering of the image during display due to the relatively small repetition frequency of the images, video systems are interlaced. This means that an image is split into two fields: one consisting of the odd image lines and and the other consisting of the even image lines. So the lines of one field are displayed in between the line of the previous field and the resulting local repetition frequency is the image is twice as high as when the complete frame was displayed.

1 3 5

623 625 lines of the first (odd) field

0 2 4

622 624 lines of the second (even) field

Figure 2-5 CCIR video system. Interlacing (a) and effective scanned area (b).

19

This is illustrated in fig. 2.5a for the CCIR norm (with fields of 312.5 lines). The video signal represent the brightness of the image along the lines of the fields and contains also synchronization pulses,which indicate the beginning of a line and a field. These synchronization pulses take also their time in the video signal (the so-called retrace time, which is also needed for the display device to position the writing beam at the beginning of the next line). This results in a smaller effective scanned area, than would be expected from the given number of lines and times. This effective area where real image data is transmitted is shown in figure 2.5b and is 74% of the total time (and so of the area).

2.5.3 Solid state video sensors A video sensor has three important functions: - light to charge conversion, - spatial accumulation of charge carriers, - signal reading. A solid state video sensor consists of an array of photo-sensitive sites. Charges are created by the photoelectric effect which frees electrons as a result of the illumination. The amount of charge accumulated at a photo site is a linear function of the local incident illumination and of the integration time. The scanning and signal reading is based on the principle of ChargeCoupled Devices (CCD), basically analogue shift registers. Small amounts of electrical charge called 'packets' are stored at specific locations in the silicon semiconductor material. These locations, called storage elements, are created by the field of a pair of gate electrodes close to the surface. By placing the storage elements close together with some overlap between adjacent elements, a charge packet can pass from one storage element to another. This transfer of a packet is realized by alternately raising and lowering the voltage on adjacent gate electrodes. In figure 2.6 the lay-out is sketched of a popular CCD solid state video sensor using the frame transfer method. This sensor is divided into an image section and a storage section [2.15]. The accumulation of charges takes place in vertical CCD registers and the charges are transfered to the storage section (dashed) during the vertical retrace time. Then the accumulation starts again at the photo sites while the storage section is read out line by line. The storage section is shifted into the horizontal read-out registers line by line (lower section) from which after amplification and adding the synchronization pulses the video signal is obtained.

20

Figure 2-6 Layout of a frame transfer solid state sensor

The spectral response of a solid state sensor peaks around 800 nm. The decrease for shorter wavelengths is a result of the transmission properties of the electrodes covering the sensor. The decrease for longer wavelengths results from the deeper penetration of the infra-red photons into the silicon. This gives rise to charge carriers in the substrate not contributing to the charge collection of the photo-sites. Another effect of the longer travel of infra-red photons is a decrease of the resolution for longer wave lengths. Charge carriers can result from incident illumination of neighbouring sites. This effect is reduced by the use of an infra-red blocking filter. In a solid state sensor the spatial accumulation of charges is separated from the signal readout. This allows the possibility to use a accumulation time different from the read-out time. In the high-speed shutter option the accumulation time is reduced. This makes the sensor less sensitive, but because of the short accumulation time the motion blur can be considerable reduced. For example the water-drops of a fall become visible. We can also do the opposite: enlarge the accumulation time. This makes the sensor more sensitive and useful in bad illumination conditions. This enlargement is however limited by thermal noise. Therefore in low-light applications cooled solid state sensors are sometimes applied. Temperature is an important factor. The storage-related parameters degrade rapidly at temperatures above 70° C (thermal relaxation). In video-cameras for the consumer market, single-chip colour sensors are realized by glueing a colour filter on-chip. This reduces the resolution of the sensor by a factor of 3. When colour is not important, a black and white camera gives the highest resolution for the same price! In professional video-cameras three solid state sensors are used for the three primary colours, and there is no reduction in resolution.

21

Solid-state cameras have no distortion of the picture geometry, nor burn-in or lag. However, when a very bright spot is present in the image, the CCD registers onto which this spot is projected saturate and bright columns appear in the image. Solid state sensors are small, light weight and mechanically rugged. The lowest light conditions of consumer cameras require around 1 - 3 lux. Resolution: video sensors developed for the consumer video market have sizes around 600 x 576 pixels. The organization and set-up of the array sensors is largely determined of course by the video norms for this market. Also special array sensors for image processing applications are available. An example is the Megaplus camera [2.16] with square pixels and a resolution of 1340 x 1037 pixels (Megaplus is a trademark of the Videk Company). Signal to noise ratio: This depends upon the illumination but ranges from 50 dB up to 64 dB in commercial devices. Besides these common properties the following properties are also found in specifications of solid state sensors: Total Photo Response Non-Uniformity (PRNU): The difference of the response levels between the most and least sensitive elements under uniform illumination. Picture element defects: The number of defective photo sites in the sensor. In a consumer video solid state sensor 604 columns x 575 lines or in total 350,000 photo-sites are present. At this moment array sensors with less than 10 defects in an image are commercially available. 2.5.4 Video digitizers (frame-grabbers) A video signal has to be digitized by a video digitizer before it can be processed by a computer. Several commercial video digitizers exist to input a video signal into a computer system. Digitizing an image frame of a CCIR video signal takes 40 ms. A sample frequency of 14.8 MHz is necessary to obtain square pixels (picture elements) in the CCIR system. Because of the retrace time the effective scanning area (CCIR) is 768 pixels on a line and 576 lines in an image (for square pixels). This is illustrated in figure 2-5b. The line period of the CCIR system is 64 µs (15625 Hz) of which 51.7 µs is the horizontal scan time and 12.3 µs is the retrace time. Three main functions are present in a video digitizer: A/D conversion, synchronization and image storage. The video digitizer converts an analog video signal into digital values. The number of bits required depends on the signal-to-noise ratio of the image sensor. This ratio depends among other things upon the illumination and is in the order of 50 dB. This corresponds to the 8 bits present in most commercial video digitizers. The synchronization of the sampling instants of the video digitizer with the scanning of the video source is one of the most crucial parts of a video digitizer. When the video source is freerunning, the video digitizer has to adjust its sample clock to the external source, so that a fixed

22

number of sample points fall into each line (defined by the interval between two linesynchronization pulses in the video signal). When we want square pixels in the digital image CCIR norm video signal, there must be 956 pixels on a line. This means that the sample clock cannot be fixed but must be adjusted to the video signal. In particular, when the video source is a video recorder, line and frame frequency may vary considerably and such an adjustment is essential. When the (solid state) sensor device delivers not only a video signal but also its pixel (scan) clock, the A/D conversion can take place completely synchronously with the scanning of the photo sites, and each sample point in the digital image corresponds in that case to one photo site in the solid state sensor. The digitized image is stored in a video memory in the video digitizer. Often this video memory can also be displayed. When the video digitizer logically resides on a processor bus, the video memory may be mapped into the working space of the processor. Image processing may take place on this stored image in the video memory. However, the peculiarities of such a video memory have then to be taken into account in all image processing routines. There are also video digitizers which use an interface bus to the computer system to transfer (parts of) images to the processor memory. It is good to make a clear distinction between square pixels (photo sites) of a solid state sensor and square pixels in a digital image. Square pixels of a solid state sensor are a result of the geometry of the lay-out of the photo site. The image values of the photo sites constitute the video signal at the rate of the pixel scan clock in the sensor. Square pixels in the digital image result from the fixed number of sample moments between two successive line-pulses in the video signal, as defined by the rate of the sample clock in the video digitizer. Only when the scan clock rate in the sensor is the same as the sample clock rate in the video digitizer there exists a one-to-one relationship between a photo site in the sensor and a pixel in the digital image, and only then a sensor with square 'pixels' will produce square pixels in the digital image! For these sensors, besides the video signal also the pixel-scan clock is needed for the video digitizer. When we have no pixel-clock connection, the sampling clock of the video digitizer defines the length of the pixels. So when this clock rate is 14.8 Mhz we have square pixels in the digital image, even when the photo sites have a rectangular size. In this case there occurs a re-quantization of the photo sites. When for instance the photo sites are larger than the pixel length defined by the digitizer clock, some photo sites are sampled twice and some are only sampled once. As the video signal passes in general a low-pass filter within the solid state sensor, an interpolation takes place with requantization as a result.

23

CCD photosites position

analog video signal

time

sampled video signal

time

digital image photo sites compared to digital image

Figure 2-7 Requantization due to different pixel and sample clocks

2.5.5 Scanners Scanners are used for digitizing photographs, drawings and hand-written or printed text. In the commonly used flat-bed scanners, the image is electronically scanned across its width by a linear CCD array containing some 2500 photo-sites (for a 300 dpi A4 scanner). Scanning in the other direction is done by moving the CCD array slowly parallel to itself underneath the glass plate on top of which the original image is put down. Typically the resolution of a desktop scanner is 300-400 dots per inch (dpi). Sometimes a scanner of this type has a provision for higher resolution (up to 1600 dpi); this is however artificially created by an interpolation algorithm, which evidently cannot increase the amount of information obtained at the basic resolution of the scanner. Colour scanners usually use three colour filters in combination with a single CCD array. The original is scanned three times in this case, once for each colour. The resolution of desktop scanners is well matched to the capabilities of other 'desktop publishing' equipment. Laserprinters with a basic resolution of 600 dpi can produce halftone images only by 'dithering' techniques at a resolution limited to effectively 150 dpi. For the dithering calculation, two to four times as many input dots are needed. Also, a postcard scanned at 300 dpi with 24 bits colour gives rise to 6 Mbytes of data, which is just a manageable amount in terms of disk space and computation time, on today's workstations. As discussed in section 2.5.3, an economy-class CCD array has a signal-to-noise ratio of 50 dB, corresponding to 8 bits of information per sample. This is sufficient for most desk-top applications where 8 bits gray-values or 24 bits colour is standard. For high-quality reproduction work in the printing industry, a resolution of 400 dpi is insufficient; here flatbed or drum scanners are used with resolutions up to 3000 dpi. One reason for using so high a resolution is that in the reproduction printing process the rasters corresponding to the various colours have to be shifted and rotated mutually to prevent

24

smearing and the appearance of moiré patterns. In order to make the scanner's high spatial resolution effective, the dynamic range of the colour channels should be increased as well. In industrial flatbed scanners, the use of selected CCD arrays and a very stable mechanical construction leads to 10-12 bits per colour channel. It is desirable to use a CCD array whose individual photosites are equal within a quantization step; otherwise unwanted parallel lines will appear in the scanner's (virtual) output image. Unfortunately this is difficult to obtain in high-resolution scanners. One can cope with this problem by a software calibration process based on the signal resulting from scanning a test image of uniform density. In industrial drum scanners, this problem is avoided. The image is scanned by a laser beam illuminating a rotating drum on which the picture is mounted. Photomultipliers are used for measuring the reflected light with a dynamic range of 120 dB corresponding to 20 bits per colour channel.

2.6 Mechanical signals 2.6.1 Position sensors Several physical principles are exploited to create position sensors. In table 2.3 the most popular sensors are listed, which will be discussed briefly. For a more extensive discussion see for instance Reijers et al. [2.3]. The LVDT (Linear Variable Differential Transformer) and the resolver are based upon the principle of electromagnetic induction. With the LVDT linear displacements can be measured. A core is moved within a special transformer of which the output voltage varies linearly with the position of the core. With a resolver angular rotations can be measured of the rotary shaft on which it is mounted. Stator and rotor windings of the resolver are driven by a two-phase clock. The phase between the stator and rotor signal is measured and converted to an angular position. An eddy current sensor is also based upon the inductive principle and is used for contactless measurement of the distance to a conducting object. It induces currents in a nearby conductor, which results in energy losses. This effectively reduces the sensor impedance, which varies almost linearly with the distance to the conductor. The effective range is short, about 10 millimeters. Although the accuracy is about 0.1% , differential motions of 0.03 mm can easily be detected.

25

sensor

principle

range

accuracy

LVDT

inductive

1 mm - 30 cm

0.25 %

contact / non-contact contact

resolver

inductive

360˚

0.3˚

contact

eddy current

inductive

0.1 mm- 6 cm

0.5%

non-contact

LVDC

capacitive

2.5 mm- 25 cm

0.01 %

contact

strain gage

resistive

length x 10-6

depend on electronics

contact

ultra-sound

acoustical

30 cm- 10 m

absolute encoder

optical

360˚

> 0.3˚

contact

PSD

optical

depends on optical system

0.01%

non-contact

non-contact

remarks

only for conducting objects

mostly used for force measurement accuracy is temperature dependent

both in 1D and 2D

Table 2.3. Position sensors

The LVDC (Linear Variable Differential Capacitor) has some resemblance to the LVDT but uses a capacitive method. As small differences in capacitance are difficult to measure, the LDVC requires careful mechanical design and expensive electronics. Strain gauges are mostly used for force measurement. Gauges are made of electrical conductors, usually thin wire or foil, bonded to the beam or other object whose strain is being measured (strain - mechanical deformation - is, within the limits of elastic behaviour of the beam, linearly related to stress, or applied force). The resistance of the gauges varies with its deformation and so with the beam's strain. As the deformation is usually small, the change in resistance is small as well. Application of strain gauges is a highly skilled art. Gauges must be bonded to a clean surface with the proper type of cement. They must be aligned properly and temperature compensated. Ultrasonic distance sensors are based upon the time-of-flight principle. An ultrasonic impulse is sent, and the time it takes before the reflected sound is received again by the transducer (in the mean time switched over to receiving mode) is a measure for the distance. This distance is computed by dividing the velocity of sound in air by two times the measured time interval. Because the sound velocity in air is temperature dependent, changes in temperature influence the measurement. Very cheap distance measurements can be realised in this way. The Polaroid company was the first to use this method for the distance measurement in its cameras.

26

Absolute encoders are high-precision rotary devices that are mounted on a shaft of a rotary drive like a resolver. They encode the angular position by a binary code. This code is read from one or more discs with concentric rings of photographed or etched codes. In figure 2.8 this principle is illustrated for 16 positions with 4 code rings. A large encoder may have 10 to 20 rings and is quite expensive. Cheaper solutions can be found with incremental encoders by counting the number of steps. However, in this case no absolute position is obtained.

Figure 2.8 Absolute encoder for 16 positions in binary and Gray code

Position sensitive devices (PSDs). The position of an illuminating lightbeam can be calculated with a PSD. In the one-dimensional configuration illustrated in figure 2.9, a PSD consists of a rectangular (e.g. 34 x 2.5 mm2) diode. The backside of the diode is fully metalized and forms the return electrode. The frontside is the light-sensitive side with two contacts A and B. When a lightbeam hits the device a current is generated by the photo electric effect. This current is split into two currents ia and ib to contact A and B. Now the PSD has been manufactured to realize an extremely constant surface resistance of the layer (1%). So the resistors Ra and Rb are proportional the length a and b :

27

incident light beam D

electrode A

b

a

electrode B

Iy1

intrinsic silicon Ix2

Ix1 (x,y) electrode A

electrode B

light sensitive area

Iy2 Area PSD

Line PSD

Figure 2.9 Line and area PSD's

ia Rb b ( D - a ) = ib Ra = a = a

ib - ia 2a and so: ib + ia = D - 1.

Thus, (ib - ia) / (ib + ia) is proportional to a. In general the light beam has a certain diameter. The output of the PSD represents in that case the centre of gravity of the beam. The spectral response of a PSD ranges from 400 nm (blue) to 1000 nm (infrared) with a peak at 900 nm. The sensitivity is around 0.6 A/watt. The resolution obtainable with a PSD is determined by the noise in the signals ia and ib. Accuracies attainable are in the range of 1 : 104. The influence of dark current and environmental light can be largely reduced by the use of pulsed light.

2.6.2 Scanning principles for distance images 1

The methods to obtain range or distance images (also called 22 D images) are mainly based on triangulation. There are two approaches: active and passive. In the active approach a scanning light source is used in combination with an imaging system. In the passive approach two imaging systems are used (stereo vision) or a moving sensor system (multiple view technique). In particular, a multiple view method can give complete 3D images. In the stereo vision type of approach, the distances to scene points are calculated from the displacements in both images from known (identified) scene points. Problems arise in the matching because of occlusion and multiple matches. Also from the shading from the image or the texture an estimation of the shape can be obtained ('shape from' techniques). In the following we will restrict ourselves to active techniques and leave further discussion of passive techniques to courses on image processing.

28

position detector

laser

laser beam

lens object

Figure 2.10 Principle of triangulation. A difference in distance results in a displacement in the sensor image

When a scene is illuminated by a small lightbeam, the distance to the illuminated scene-element can be calculated by triangulation. Only the position of the illuminated element is necessary to calculate the distance, so both image sensors and PSD's may be used to calculate this distance. A range image can be obtained by a complete scan of the scene with a light beam. One dimension of the scanning can be present in the movement of the object (or system). In that case only a one-dimensional range profile has to be calculated in a plane intersection across the scene, perpendicular to the direction of motion. This total image is obtained by combining the range profiles of the successive object positions. When no movement of the object is present an additional motion of the whole sensor system is necessary, which is mechanical. Commercial systems based on these principles are available, but are in general slow. 2.6.3 Sound A microphone responds to acoustical signals coming from different directions, but usually the sensitivity is direction-dependent. The sensitivity of a microphone depends also upon the frequency of the acoustical signal. In figure 2.12b the sensitivity of a common electret microphone is sketched. An electret microphone is based upon the following principle: a vibrating foil forms a capacitor with an second plate, and the varying capacitance due to the vibration of the foil is translated into a changing voltage because a permanent electrical charge is present, caused by an electret mounted on the fixed plate. (An electret is made by heating a dielectric material and then suddenly cool it in the presence of a strong electric field). The characteristic of the sensitivity of this microphone as function of frequency is flat from 100 Hz to 5 kHz, and it decreases for higher and lower frequencies (with a peak for this specific microphone at 10 kHz). The frequency range of a microphone should for most applications cover the range of human hearing (20 Hz - 20 kHz) .

29

0°

90°

-90° -10 dB -5 dB 0 dB

180° sensiti vity in dB

10

100

1 kHz

10 kHz

frequency in Hz

Figure 2.12 Sensitivity of an electret microphone as function of the direction angle (top) and of the frequency of the acoustical signal (bottom)

2.6.4 Compression and expansion In speech, small amplitudes ar much more frequent then large amplitudes. Hence if the microphone signal is uniformly quantized (using equal distance Δv between the quantization levels of the ADC), many levels are seldomly used. This can be improved either by choosing a non-uniform quantization level distribution (resulting in a higher density of levels in the lower part of the ADC's input range), or by compressing the input signal prior to sampling. The latter is often done (for instance in digital telephone systems) according to the so-called µ-law: y = log(1 + µ.x) / log(1 + µ) where x is the (positive) input voltage. Both x and y are normalized over the range (0,1) in this simplified version. With an 8-bits ADC and choosing µ = 255, the ADC's output SNR is roughly constant over a 40 dB range. For restoring the original signal, an expander is needed based on the inverse function. A comparable compression scheme is based on the a-law. Evidently, these companding techniques are not restricted to be used with microphone signals only.

30

2.7 Standards for measurement systems For many years instrument manufactures have worked to standardize the electrical and mechanical interface between instruments and computers. A well-known example is the HP-IP or IEEE-488 bus [2.22]. There are many stand-alone measuring instruments on the market equipped with an IEEE-488 interface. Connecting the instruments to a workstation and controlling the instrument through this interface makes its easy to set up an automated measuring system. There are also cards (for instance audio and video digitizers of different vendors) for internal computer buses of widespread systems and workstations such as IBM compatibles (ISA- or AT-bus), SUN workstations (S-bus) and systems using the VME-bus. Although the electrical and mechanical interface between instruments and computer systems are standardized, the messages over the interface are not. As a result many different command sets were developed not only for different measuring instruments but even for the same type of measuring instrument from different vendors. Recently, Standard Commands for Programmable Instrumentation (SCPI) have been defined and are adopted by an increasing number of instrument manufacturers [2.23]. The question arises whether a SCPI instrument from one vendor can be replaced by an instrument of another vendor. Unfortunately, complete interchangeability cannot be guaranteed as of today. Nevertheless, SCPI provides a high degree of consistency among instruments. The command to measure a frequency is the same whether the measurement is made by an oscilloscope or a counter. A fundamental objective of SCPI is to provide a simple way to perform simple operations. The MEASure command is the easiest way to configure and read data from an instrument. When the program message (of which the small letters are optional) :MEASure:VOLTage:AC? is received by a voltmeter, the meter will select settings and configure itself for an AC voltage measurement, initiate the measurement and return the result to the system controller. A user can specify characteristics of the signal measurement, such as expected signal value or the resolution of the measurement, by adding parameters to the command. For example : :MEASure:VOLTage:AC? 20, 0.001 instructs the meter to configure itself to make AC measurement on a signal of around 20 volts with 0.001 volts resolution. To provide direct control over an instrument's hardware, SCPI contains command subsystems that control particular instruments and settings. To define the commands used to provide this control, SCPI uses a generalized model of a programmable instrument shown in figure 2-13. The model defines where elements of the language must be assigned in the SCPI hierarchy. Major areas of signal functionality are shown as blocks. The signal routing block takes care of the routing of signals between an instrument's port and its internal signal functionality.

31

signal in

signal routing

measurement function

trigger

signal out

signal routing

memory

signal generation

format

to bus

format

from bus

display

Figure 2-13. Generalized Instrument Model of SCPI

The measurement function block converts a physical signal into an internal data form that is available for formatting into bus data. It may perform the additional tasks of signal conditioning and post-conversion calculation. The signal generation block is responsible for conversion of data into physical signals. It may perform additional tasks of preconversion calculation and signal conditioning. The purpose of the trigger block is to provide an instrument with synchronization capability with external events. The purpose of the memory block is to hold data inside the instrument. While every programmable instrument contains memory, not all such instruments provide explicit control of this memory. The format block converts between data representations, especially on the data that is transferred over the external interface. An example is conversion of internal data formats into ASCII. The purpose of the display block is to control the display of the signals if the instrument is equipped for that purpose. SCPI is a major advance in providing a standard instrument vocabulary and provides to the user shorter programming time, better understandable and maintainable programs and has greatly increased likelihood of instrument interchangeability.

2.8 Virtual sensors Often we won't be dealing with one single sensor, but with a more complex system, incorporating not only the sensor but also the processing of the sensor data and the control of sensor parameters. This makes it hard to define where in a complex system the sensing ends and the processing begins. This point can be illustrated with a distance image. In a distance image the values of the pixels give the distance to the closest object. It can be obtained with an acoustical imaging system, measuring the time-of-flight of a reflected sound pulse: a complex sensing system without complex processing. It can also be obtained with stereo vision techniques where we have two images of a scene from different viewpoints. From the disparities between the objects in the images the distance can be calculated. In this case substantial processing is required to 32

find the disparities between the images and to convert these to distances. A third technique is based on a structured illumination of the scene, where both processing of the video image and active control of the illumination is required to obtain a distance image. Is the first approach a sensor and the others not ? Should it depend upon the definition of what is measured whether we call the system a sensor or not ? 2.8.1 Sensor model A solution to model more complex sensing systems is the definition of virtual (or logical) sensors. A system for measuring a certain property is called a virtual sensor. A virtual sensor can be identical to the traditional sensor, converting a simple physical quantity into an electrical signal (and therefore sometimes called transducer), but it can also be some complex processing routine. This approach opens the way to create many different virtual sensors through combinations of others, leading to a flexible and modular sensor system structure. An important aspect of such a sensor model is that it is capable to handle the robustness of the system. When a human interprets a scene, he always has expectations about what may be present in the scene, what sort of objects could physically exist, and which are the constraints set by the laws of physics, to mention a few aspects. A sensor (processing) model should be capable to deal with expectations and uncertainty in the measurements. Obtaining reliable and robust sensor data interpretation is of great importance since erroneous interpretation of the sensor data may lead to unwanted actions in (autonomous) information systems. It is therefore a major area for study. Robustness can be obtained by measuring the desired quantity in different ways or with different sensors. By judging the consistency of the results, errors can be detected. With statistical techniques the different measurements can be combined to obtain a best estimate of the quantity to be measured or to reject outliers in the measurements. When not a complete different measuring strategy is used and for instance the same sensor is used to repeat the measurement , the same error or exception can be present and will not be detected from the results alone. This is possible however, when we use a priori knowledge of what to expect. Within this virtual sensor concept a mechanism has been incorporated to handle an 'erroneous' input to a virtual sensor. Every input is judged by an 'acceptance test' which result in the acceptance or rejection of the input . In case of rejection, the virtual sensor has a list of alternative virtual sensors available that can provide the same input. From this list it picks the next virtual sensor which is then activated. A virtual sensor fails when its list of alternatives is exhausted.

33

2.9 References 2.1

K. Lion: Transducers: problems and prospects. IEEE Trans. Industr. Electron. & Control. Instrum., IECI-16 (1969) pp 2-5. 2.2 S. Middelhoek, S.A. Audet: Silicon Sensors. TUD, Department of Electrical Engineering Et 05-31. 2.3 L.N. Reijers, H.J.L.M. de Haas: Flexibele Produktie Automatisering, deel III Industriele robots. Technische Uitgeverij De Vey Mestdagh BV., Middelburg. 2.4 R. Schasfoort: Chemische sensorontwikkeling bij TNO. Sensornieuws, vol 2 (1993) pp 8-10. 2.5 A. Novini: Before you buy a Vision System... Manufacturing Engineering, vol.94 (1985) no 3, pp 42-48. 2.6 H.E. Schroeder: Practical illumination concept and technique for machine vision applications. Proc. Robots 8 (1984), pp 14-43. 2.7 R.A. Jarvis: A perspective on range finding techniques for computer vision. IEEE PAMI-5 (1983) pp 122-139. 2.8 H.R. Everett H.R.: Survey of collision avoidance and ranging sensors for mobile robots. Robotics and Autonomous Systems, 5 (1989). 2.9 S. Inokuchi, K. Sato, F. Mutsuda: Range-imaging system for 3-D object recognition. Proc. 7th Int. Conf. on Pattern Recognition (1984) pp 806-808. 2.11 Barnard S.T., Thompson W.B.: Disparity analysis of images, IEEE PAMI, Vol. PAMI-2, No. 4, July 1980, pp 333-340. 2.12 Fairchild: CCD. The solid State Imaging Technology. 2.15 Philips: The frame-transfer sensor an attractive alternative to the tv camera tube, Philips Technical Publication 150, 1985. 2.16 Videk: Megaplus camera: CCD Camera for high resolution applications. Videk, New York. 2.22 Jenssen K.: VXIbus: A new interconnection standard for modular instruments, Hewlett-Packard Journal, Vol 40, no.2, April 1989, pp.91-94. 2.23 Standard Commands for Programmable Instrument Manual, Version 1990.0, April 1990. 2.24 Owen Bishop: Practical Electronic Sensors. Bernard Babani, London 1991 2.25 R. Pallas-Areny, J.G. Webster: Sensors and Signal Conditioning. Wiley 1991.

34

3. Continuous-time signals and systems

As we have seen in the preceding chapters, a sensor is used for observing some physical quantity over an interval of time. Hence, using a caliper to measure the diameter of a Dutch fivecents coin wouldn't be called sensing. We know beforehand that we will find the same value (21 mm) once and for all, in other words, continuing the measurement doesn't provide any information. We speak of sensing only when we expect to find information-bearing changes in the quantity being observed. Changes in the observed value may also depend on other quantities than time. For instance, the gray value of a photograph is a function of the coordinates of the point where it is being measured. On the target of a CCD camera, the illumination of a photosite is a function of both position and time. The result of sensing is a signal: an information-conveying function of one or more independent variables. In the physical world, the independent variables (like time, position) are almost always continuous. Man-made signals sometimes have a discrete independent variable: the Dow Jones index is determined once every day and is undefined inbetween. The dependent variable (length, gray value) is often a continuously variable scalar quantity. Colour on the other hand is a vector-valued quantity (r,g,b); and a Morse signal has a discrete dependent variable (mark, space). The representation of a signal inside a digital computer is necessarily discrete in both the dependent and the independent variables. To be able to manipulate signals, we need a mathematical description. Confining ourselves to scalar continuous-time signals x = f(t), an obvious method of description is to specify x for every t. Unfortunately this can be done only if we know the analytic form of the function f(t) beforehand: for instance we might know that x = a.sin(bt). But we see immediately that this signal is a trivial one (though somewhat less trivial than the coin diameter 'signal'): once a and b have been determined, the signal is known for all time and no information is being conveyed. Real, information-bearing signals essentially have some degree of unpredictability. Even then however, such signals are subject to certain constraints. For instance, any signal has limited duration and, equally important, has only finite detail: an audio signal doesn't contain 'frequencies' beyond 15 kHz (we say that it is band limited - in fact, any signal from physical origin is band limited). This kind of restriction makes that a signal (and the message it conveys) can be described by a finite number of parameters. From information theory we know that this is a condition for information: the number of possible messages has to be finite otherwise their coding will be impossible. This chapter is devoted to the question how a given continuous-time signal can be parametrized. In chapter 4, the same topic is discussed for discrete-time signals. The theory can easily be generalized for other kinds of signals, like images.

35

3.1 Least-squares approximation of a function Let f(t) be given on the interval (t1,t2). In what follows we will assume generally that t denotes time. We want to approximate f(t) as closely as possible by c.ϕ(t). Here ϕ(t) is a different function defined on the same interval. The value of the constant c is chosen so as to minimize the integral-square or L2 norm t2 J(c) = ∫ (f(t)-c.ϕ(t))2dt. t1 We can calculate c by noting that in the minimum the derivative of the function J(c) should equal zero: dJ/dc = 0. This leads to t2 t2 c = ∫ f.ϕ dt / ∫ ϕ2 dt . t1 t1 t2 Example: ϕ(t) = 1 for t ∈ (t1,t2); arbitrary f(t). Then we have c = ∫ f(t)dt / (t2-t1) t1 or in words: c is the average of f(t) over the interval (t1,t2). 3.2 Orthogonal functions t2 f(t) and ϕ(t) are called orthogonal if their inner product (f,g) =

∫

f(t).ϕ(t)dt

is zero.

t1 Assume that we have a collection of mutually orthogonal functions { ϕi(t) }, i ∈ Z for which by definition holds that their innner product t2 ϕi(t).ϕj(t)dt = 0 if and only if i ≠ j .

∫

(3.1)

t1 ∞

This time we want to approximate f(t) by a linear combination (a weighted sum)

∑ci.ϕi(t) .

i=-∞

Again we determine the coefficients ci by minimizing the L2 norm t2 J (....,c1,c2,c3,....) =

∫

t1

∞ ( f(t) -

∑ci.ϕi(t) )2dt , i ∈ Z.

i=-∞

36

Thus, by putting ∂J/∂ci = 0 and using (1) we obtain for the i-th coefficient t2

t2 ci =

f(t)ϕi(t)dt

∫

/∫

ϕi2(t)dt

(3.2)

t1

t1

independent of all other coefficients! This is called the expansion of f(t) into a series of orthogonal basis functions {ϕi(t)}. It sometimes helps to gain insight in these matters if we consider the basis functions as mutually orthogonal vectors in an infinite-dimensional space (called Hilbert space). Then we can regard a given function to be represented by another vector, and its Fourier coefficients as the projections of this vector onto the basis vectors. The length of a Hilbert space vector associated with the function f(t) (always defined on the same interval (t1,t2)) is, like in standard geometry, the square-root of (f,f) =

t2

∫

|f(t)|2dt .

t1

3.3 Fourier series expansion of a function defined on a finite interval Consider a function f(t) on (t1,t2) with t2 - t1 = T. The set of basis functions {sin(nΩt), cos(nΩt)}, n ∈ N , Ω = 2π/T, (3.3) constitutes an orthogonal system on the interval (t1,t2), since their mutual inner products satisfy the following for all n,m ∈ N: (sin(nΩt), sin(mΩt)) = πδnm/Ω (cos(nΩt),cos(mΩt)) = πδnm/Ω (cos(nΩt), sin(mΩt)) = 0 where δnm = 1 if n = m, δnm = 0 if n ≠ m. If we have a function sin(Ωt) or cos(Ωt) we call Ω its (angular) frequency; it is measured in radians per second. Technicians commonly use the (period-) frequency ν = Ω/2π. This is the number of periods per second and is measured in Hertz (Hz). The representation of a function as a weighted sum of basis functions from the set (3.3) is called a Fourier series expansion. A Fourier series can be written down in various ways: ∞

f(t)

=

a0 +

∑{an.cos(nΩt) + bn.sin(nΩt)} n=1 ∞

=

∑{an.cos(nΩt) + bn.sin(nΩt)} n=0 ∞

=

a0 +

∑{cn.cos(nΩt + γn)} n=1

37

(3.4)

∞

=

a0 +

∑{dn.sin(nΩt + δn)} n=1 ∞

∑Fn.einΩt

=

n=-∞

with 1 a0 = T

2 an = T

t2

∫

t1 t2

∫

f(t) dt (average of the function f(t) )

2 f(t)cos(nΩt) dt , bn = T

t1

t2 f(t)sin(nΩt) dt (inner products).

∫

t1

The factor 2/T results from the fact that t2

∫

t2 sin2(nΩt)dt =

t1

∫

T cos2(nΩt)dt = 2

t1

(recall that t2 - t1 = T). It is not difficult to see that the quantities c, γ, d, δ and F can be derived from a and b, and vice versa. For instance F0 = a0, Fn = (an - ibn)/2, F-n = (an + i bn)/2, n ≥ 1. The original function f(t) can be continued periodically, and the same is true for f(t)'s Fourier series expansion. For the particular type of functions we are discussing here (defined on a finite interval or periodic) the Fourier series expansion embodies their frequency-domain representation. Sometimes Ω = 2π/T is called the fundamental frequency in f(t)'s Fourier series expansion. For n>0, nΩ is called the n-th harmonic of Ω. Figure 3-1 gives some examples of Fourier series expansions. 3.4 Energy and power; power spectrum t2 The energy of f(t), t ∈ (t1,t2) with t2 - t1 = T is E =

∫

|f(t)|2 dt ; its power is P = E/T.

t1

(power = energy per unit of time; units: watt = joule / seconde). The contribution to P of the Fourier component of f(t) with frequency nΩ = n.2π/T is t2 t2 1 2 1 1 Pn = T an cos2(nΩt) dt + T bn2 sin2(nΩt) dt = 2 (an2 + bn2 )

∫

t1

∫

t1

38

For the total power of f(t) the Parseval-relation holds: t2 ∞ 1 1 ∞ 2 Parseval 1 2 2 2 P = ∑ Pn = T f (t) dt = T (a0 + 2 ∑(an + bn )) .

∫

n=0

n=1

t1

This relation shows that the distribution of power over the sine- and cosine-components, which evidently depends on the choice of the time origin, has no effect on the total power. The sequence {Pn} is called the power spectrum of f(t). It is a discrete ('line-') spectrum, since it is defined only for discrete values of the frequency. Because the sine- and cosine-terms corresponding to any harmonic are put together in the power spectrum, phase information is lost, and it is not possible to reconstruct f(t) from its power spectrum.

a

b

c

d

e

f

g t=0

t=0

A 2A cos3Ωt cos5Ωt (cosΩt + - .............) 2 π 3 5 2A sin3Ωt sin5Ωt b) (sinΩt + + + .............) π 3 5 4A sin3Ωt sin5Ωt c) (sinΩt + - .............) 2 32 52 π 4A cos3Ωt cos5Ωt d) (cosΩt + + + .............) 2 32 52 π A sin2Ωt sin3Ωt e) (sinΩt + - .............) π 2 3 A sin2Ωt sin3Ωt f) - (sinΩt + + - .............) π 2 3 2 1 g) ( + cosΩt + cos2Ωt + cos3Ωt + .............) Τ 2 a)

with A top-to-top amplitude; T interval length; Ω = 2π/T

Figure 3-1 Examples of Fourier series expansions

39

3.5 Example: f(t) is an impulse In this example we consider an impulse of width 2Δ and height A = 1/(2Δ) (figure 3-2). We assume again that f(t) is given on a finite time interval of duration T. However, we could as well say that f(t) is a periodic function with period T. The Fourier series expansion is identical in both cases. In order to determine the Fourier-coefficients {an} we choose the time origin (t = 0) to be half way our impulse. f(t) is an even function in that case; it can be written as a cosine-series (i.e. in (1.4) the coefficients bn of the sine terms are all zero). Now we have, with Ω = 2π/T, t2 Δ 2 4A an = T f(t).cos(nΩt) dt = T cos(nΩt) dt = (3.5a)

∫

∫

t1 2 sin(nΩΔ) = T nΩΔ .

0 (3.5b)

We now consider two limiting cases, A and B. Case A: T is constant, Δ → 0 while the product A.Δ (i.e., the area under the impulse) remains constant = 1. The limit case impulse is often called the Dirac delta function, although it is not a function in the usual sense (in fact, it is a so-called distribution). In this case, the equation for an becomes an = 2/T in the limit, independent of n! In words: a δ-impulse function (periodic or on a finite interval) has a flat spectrum. This is illustrated in the last example in figure 3-1. To derive this we have used the well known fact that for x → 0, sin(x)/x → 1. The function sin(x)/x (figure 3-3) is so common in signal processing theory (and elsewhere) that it got its own name: sinc(x). Case B: Δ is constant, T → ∞ (a finite impulse defined on an infinite interval). What happens now is not so easy to see. We can describe it qualitatively as follows: if T is made larger and larger, the lines in the spectrum of our finite-width impulse will come ever closer to each other (because their distance is Ω = 1/T, cf. equation 3.5).

Figure 3-2 Impulse function

Figure 3-3 The function sin(x)/x

40

In the limit for T → ∞ the original line spectrum changes into a continuous spectrum F(Ω) of finite width which should be interpreted as a density function. That is, the contribution to the impulse's power by frequencies between Ω1 and Ω 2 is proportional to the area under F(Ω) between the given limits. In the present case, F(Ω) is a sinc function similar to the envelope of the discrete spectrum (3.5b) in the original example: F(Ω) = 2AΔ sinc (ΩΔ) = 2sinc (ΩΔ). F(Ω) is called the Fourier transform of our Δ-impulse.

(3.5c)

3.6 The continuous-time Fourier transformation (CTFT) The Fourier transformation introduced in the previous section is applicable if we are dealing with the frequency representation of functions defined on the entire time-axis (-∞ < t < ∞) and satisfying certain conditions, the most important of which is that they have finite energy. This means in fact that they are localized more or less on the time axis. Thus, a periodic function cannot be described by a Fourier transform, only by a Fourier series as we have seen before. The general form of the so-called Fourier transform pair is ∞ ∞ 1 f(t) = 2π F(Ω)e+iΩtdΩ with F(Ω) = f(t)e-iΩtdt (3.6)

∫

∫

-∞ -∞ F(Ω) is the frequency-domain representation of f(t), but we could say equally well that f(t) is the time-domain representation of F(Ω). We will often indicate a Fourier pair by the notation f(t) ↔ F(Ω). It is possible (but not completely trivial) to show that (3.6) is valid by substituting the second expression in the first one; evaluating the resulting integral results then in an identity f(t) = f(t). It should be noted that either member of a Fourier pair can be a complex function. If F(Ω) is complex it is often useful to write it as the product of a phase factor and a real modulus or amplitude: ImF(Ω) F(Ω) = eiϕ(Ω).A(Ω), with A(Ω) = |F(Ω)| , ϕ(Ω) = arctgReF(Ω) (3.7) Frequency-domain representation is often the obvious way of describing a signal. For instance, the sound emitted by a bowed violin string is characterized by the fact that it contains a certain fundamental frequency (say, 440 Hz) plus a number of harmonics (880 Hz, 1320 Hz,...) the relative intensities of which account for the typical timbre of the violin tone. A time-domain description of this phenomenon would be very clumsy. Another point of view is that the highfrequency components in the Fourier transform are responsible for the small (time-domain) details of a signal.

41

3.7 The uncertainty principle In section 3.5 we found the expression (3.5c) for the Fourier transform of an impulse of finite width Δ. For Δ → 0, the impulse becomes a Dirac-delta, and F(Ω) becomes a constant, independent of Ω (i.e., we have a 'flat' spectrum). Conversely it is clear, both from the concept of the Fourier transform and from the symmetry of (3.6), that a Dirac-delta δ(Ω-Ω0) in the frequency domain corresponds to a sinusoid of frequency Ω0 and infinite duration. As it appears, an f(t) with small width is always accompanied by a wide F(Ω) and vice versa. This suggest that there is a fundamental relationship between the widths δt and δΩ of both members of a Fourier pair. A problem here is that there is no obvious meaningful and at the same time sufficiently general definition of 'width'. A common width measure is the standard deviation f2(t) or F2(Ω), the square being used because the standard deviation is only meaningful for a non-negative function (like a probability distribution). For arbitrary Fourier pairs it can be shown that the product δt.δΩ is al least of order 2π, or when expressed in terms of the 'period' frequency ν: δt.δν is at least of order 1. This fundamental property of the Fourier transformation has many consequences in practical signal processing, some of which we will discuss later on. Example: Tone discrimination We consider the problem of discriminating two nearby tones (for instance 440 Hz and 442 Hz) simultaneously present in an audio signal. This problem is important, for instance, in Doppler radar used for traffic speed surveillance. The uncertainty relation (section 3.7) tells us that a signal of finite duration (or one being observed for a finite time) necessarily has a Fourier spectrum of finite width: δt.δν > 1. Thus, as we have a frequency difference of δν = 2 Hz in the given example, we should observe the signal at least for a time of order δt = 1/2 = 0.5 s, or about 220 cycles at the average frequency 441 Hz. This value is a lower bound to the observation time δt which often has to be exceeded appreciably in order to get useful results.

3.8 Some properties of the Fourier transform Symmetry properties: if f(t) is a real function, than F(Ω) = F*(-Ω) hence |F(Ω)|2 = |F(-Ω)|2; if f(t) is even, than so is F(Ω): F(Ω) = F(-Ω); if f(t) is odd, than so is F(Ω): F(Ω) = - F(-Ω); if f(t) is real and even, than so is F(Ω) (cosine transform).

(3.8a)

Linearity: a.f(t) + b.g(t) ↔ a.F(Ω) + b.G(Ω).

(3.8b)

Time-shift property: f(t - t0) ↔

(3.8c)

exp(- i t0Ω).F(Ω) = exp(i ϕ(Ω) - i t0Ω).A(Ω).

Frequency-shift property (modulation-property): F(Ω - Ω0) ↔ exp(+ i Ω0t).f(t). 42

(3.8d)

Transform of a derivative: dnf(t) ↔ (iΩ)n F(Ω). dtn

(3.8e)

3.9 Linear Time Invariant (LTI) systems A system S is a black box in which a signal x(t) enters and from which a signal y(t) leaves which is modified version of x(t): y(t) = S{x(t)}. S is an LTI system if and only if the following conditions are met: y(t + T) = S{x(t + T)} for any T (time-invariance); if y1 = S{x1} and y2 = S{x2} then S{ x1 + x2 } = y1 + y2 (additivity); if y = S{x} then S{a.x} = a.y for any constant a (scaling invariance).

(3.9a) (3.9b) (3.9c)

A system's output at any moment depends both on the current input and on the system's state (its memory). If the state is represented by a vector of n independent quantities, we say that we are dealing with an n-th order system. Such a system can be described by either a system of n first-order differential equations, or by a single n-th order differential equation. Both tell how the state evolves from its current value under the influence of the input signal (although the state vector may not always be explicitly visible in the equation). If the output signal y isn't simply equal to one of the state components, a separate (algebraic, not differential) output equation is needed.

3.10 Example of an LTI system: a low-pass filter Consider a particular LTI system S for which the relation between input signal x and output signal y is described by the first-order linear differential equation . τ y + y(t) = x(t) (3.10a) or equivalently . 1 y = τ (x - y ) . . In these equations y is an abbreviation of dy(t) . dt

(3.10b)

Equation (3.10b) shows how the change in state variable y depends on the current values of x(t) and y(t). In this case, the output variable y happens to be the same as the state variable. The parameter τ in equation (3.10) representents the system's characteristic time.

43

Which output signal y(t) is produced by S in response to some input signal x(t)? To answer this question we calculate first the response to a sinusoidal input signal with frequency Ω and real amplitude X x(t) = X.eiΩ t By substitution we can verify easily that the output signal has the form y(t) = Y.eiΩ t where Y is a complex quantity in general, which can be written as Y = |Y|.eiϕ. If we consider a second-order system putting

d2 y(t) dy(t) + a. dt dt2

+ b.y(t) = x(t) we can expose the state vector by

dy(t) = z(t). Then we obtain a system of two first-order equations dt dy(t) dt = z(t) dz(t) = - a.z(t) - b.y(t) + x(t) dt

which clearly shows how the rate of change (i.e., the derivative) of the state vector (y,z) depends on the current state and the current input x. In this case too, the output y is identical to one of the components of the system's state.

In words: the system's output signal is a sinusoid of the same frequency as the input signal, although its phase and amplitude may be different. This is true for any LTI system. (Sinusoids are therefore called the eigenfunctions of LTI systems). Some calculation shows that for system (3.10) the relation between Y and X is |Y| =

|X| 1 + Ω2 τ2

; ϕ = arctg( - Ωτ ).

(3.10c)

The first expression in (3.10c) shows why this LTI system behaves as a low-pass filter: while for Ω = 0 the input and output amplitudes |X | and |Y | are equal, their ratio |Y|/|X| becomes ever smaller if Ω is increased. Formula (3.10c) gives the frequency response function of system S in terms of an amplitude- and a phase-response function. Alternatively, these can be combined in a single complex frequency response function Y(Ω) 1 H(Ω) = X(Ω) = (1 + i Ω τ )

.

(3.10d)

For convenience we have written here both X and Y as functions of Ω. It is not difficult to see that in this particular example S belongs to the class of LTI systems. Now the fact that the behaviour of S for a sinusoidal input signal can be described by a frequency response function H(Ω) suggests, that we can use this also for calculating the response to an arbitrary input signal x(t). For, according to Fourier, any x(t) can be regarded as a superposition of sinusoids, and an LTI system behaves additively with respect to such a superposition. In other words, between the Fourier transforms X(Ω) and Y(Ω) of our arbitrary

44

x(t) and the corresponding response y(t), we expect a similar relation as between the sine amplitudes X(Ω) and Y(Ω) in the previous example: Y(Ω) = H(Ω).X(Ω). Formally, we can interpret H(Ω) as the Fourier transform of some time function h(t): ∞ 1 H(Ω) = 2π h(t).e- iΩt dt .

∫

-∞ After some calculations we find then the following relation between x(t), y(t) and h(t): ∞ y(t) =

∫

x(t').h(t - t') dt' = x(t) * h(t),

(3.11)

-∞ which is called the convolution-product or convolution of x(t) and h(t). The significance of h(t) can be seen if we take as input signal the Dirac delta function: x(t) = δ(t). Then ∞ y(t) = δ(t').h(t - t') dt' = h(t), (3.12)

∫

-∞ where we have used the sifting property of the δ-function. For obvious reasons, h(t) is called the impulse response of the system S. Like its Fourier transform H(Ω), the impulse response h(t) provides a complete description of an LTI system. Notice that the convolution operator is commutative: i.e. x(t) * h(t) = h(t) * x(t). 3.11 Low-pass filters Consider the modulus (or amplitude) part A(Ω) of the frequency response function of a special kind of LTI system called low-pass (LP) filter. (A(Ω) is often called the filter's frequency characteristic). For an ideal LP filter (a 'cardinal' filter) one would expect that A(Ω) =1 for |Ω| < Ωc, and A(Ω) = 0 for |Ω| > Ωc where Ωc > 0 is the filter's cut-off frequency. In practice, such a filter cannot be realized from electrical components (resistors, capacitors, inductors) for various reasons. The most fundamental of these is the causality condition, which states that a cause cannot be preceded by its effect. This means for an LTI system that its impulse response function h(t) must be zero for t < 0. It can be shown that the impulse response of an ideal filter is necessarily symmetric around t = t0 (where t0 is the filter's delay time) and therefore cannot be causal. Examples of realistic low-pass filter frequency characteristics are shown in figure 3-5.

45

3.12 High-pass and band-pass filters A high-pass filter can in principle be obtained by constructing the complement of the output signal of a low-pass filter. A special example of a HP filter is the differentiator, for which A(Ω) = |Ω| (cf. the differentiation property of the Fourier transform, eq. 3.8e).

Figure 3.5 Examples of low-pass filter frequency characteristics. From [3.4]

Likewise, band-pass filters can be synthesized by combining LP and HP filters. This is almost never done this way, but we can see that it is not needed to formulate separate theories for the three classes of filters. For a symmetrical band-pass filter it can be shown by means of the modulation property of the Fourier transformation, that its impulse response is a 'wave packet' with an envelope similar to the corresponding LP filter.

46

3.13 Deconvolution In principle, filters can be designed which undo the effect of other filters. However, this process (called deconvolution or inverse filtering) is inherently unstable and cannot normally be realized without additional measures by the class of LTI filters discussed in this syllabus.

3.14 Non-linear systems The superposition principle, an essential feature of LTI systems, is not valid for non-linear systems like y(t) = x2(t). Such systems have the characteristic property that 'more frequencies leave than enter the system'. Often non-linearity is an unwanted side-effect due to the technical limitations of electronic components, like transistors. In audio electronics, for instance, nonlinearity leads to 'intermodulation distortion'. On the other hand, non-linear filters also have important 'constructive' applications. The median filter (a so-called order-statistic filter) is an important example.

3.15 An introduction to stochastic signals In the introduction to this chapter it was mentioned already that regarding a signal as a deterministic phenomenon is often not appropriate in practice. One reason is that really interesting, information conveying signals are never exactly predictable. In other words: such signals have a random or stochastic nature. A given or measured signal x(t) is regarded to be a realization of a stochastic process x(t). The statistical description is concerned with the ensemble of all possible realizations of the process, although in reality we have only one realization: x(t). The expectation E{x } of x(t1) (that is, at a certain instant t1) is defined by E{x } = µ =

∫

x.p(x).dx

Ξ1

where p(x) is the pdf (probability density function) of x at time t1 and Ξ1 is the set of all values that x can take at time t1 . The variance at t1 is the expectation of (x(t1) - µ )2 : E{ (x - µ )2 } = σ2 =

∫

(x-µ)2.p(x).dx

Ξ1

Of course, p(x) cannot be inferred from the set Ξ1 (because we know only one of its members). Therefore p(x) is an a priori known distribution which summarizes our knowledge of the process at time t1.

47

Often it is assumed for more or less plausible reasons that p(x) is 'normal' or Gaussian: p(x) =

(x-µ)2 1 .exp( ) σ 2π 2σ2

If our process x is stationary, then one single p(x) is valid for every instant t. If the process is ergodic as well, we may substitute time-averages for ensemble-averages: 1 µ = lim 2T T→∞

Τ

∫

x(t).dt

1 σ2 = lim 2T T→∞

Τ

∫

(x(t)-µ)2.dt

−Τ

−Τ

Ergodicity implies, roughly speaking, that every single realization of the process goes through all the values the process can possibly generate. Note that in the above expressions the explicit dependence on p(x) has disappeared! In what follows we will always assume that the process is stationary and ergodic. Moreover we will assume that the mean is µ = 0. Then σ2 is the mean power of the process x(t). The behaviour of a process at time t2 is never independent of what happens at a nearby instant t1 (independency is only found in the idealized 'white noise' process).The autocovariance function γxx(t2-t1) shows the similarity (more precisely: the linear dependence) between the process and its replica time-shifted over τ = t2 - t1. If the process is assumed to be stationary, γxx depends only on the time-distance τ, not on the absolute values of t1 and t2. Thus we can write γxx(τ) instead of γxx(t1-t2). If the process is stationary and ergodic, we can write γxx as time average over a single realization: 1 γxx (τ) = lim 2T T→∞

Τ

∫

x(t).x(t+τ).dt

(3.13)

−Τ

Notice that γxx (0) = σ2. Often the autocorrelation function ρ is used: ρ xx (τ) =

γxx(τ) σ2

(3.14)

It has the following properties: ρxx (0) = 1; ρxx (τ) = ρxx (-τ); |ρxx (τ)| ≤ 1. Besides the autocovariance function we have the cross-covariance function which tells about the similarity between a stochastic process and a time-shifted replica of a different stochastic process. Under similar assumptions, the cross-covariance function can be written as 1 γxy (τ) = lim 2T T→∞

Τ

∫

x(t).y(t+τ).dt

(3.15)

−Τ

48

3.16 Power spectrum In the frequency domain description of a stationary stochastic process, phase doesn't appear: obviously a shift of the time axis cannot make any difference in the properties of the process. Therefore the frequency domain description is restricted to the power spectrum P(Ω). Wiener and Khinchin have shown that P(Ω) is the Fourier transform of the autocovariance function: P(Ω) ↔ γxx (τ) (3.16) This relation can be seen as a definition of the power spectrum for stochastic signals.

Example: the 'random telegraph signal' An oversimplification of a process _y(t) producing Morse signals is based on the assumption that the timing of the on/off (0↔1) transitions can be described by a Poisson process. Then the probability of k transitions occurring in the time interval T is Prob(k,T) =

(aT)k -aT .e k!

where a is the average number of transitions per unit of time. It is not difficult to show that in this case the autocovariance function (acf) looks like 1

γ y y (τ) = 4 ( 1 + e-2a|τ| ) 1

By the transformation y(t) = x(t) - 2 we obtain a process _x(t) with zero average and acf 1

γx x (τ) = 4 e-2a|τ|

(3.17)

For 'technical' signals, exponential acf's are quite common. However for 'natural' signals (like electrocardiograms, ocean waves, noise in semiconductor devices, music) the acf tends to behave low-order polynomial. This means that correlation is present over many time scales. Such signals are sometimes termed 'chaotic' or 'fractal'. The power spectrum of _x(t) is the Fourier transform of γx x (τ): ∞ 1 a P(Ω) = 4 ∫ e-a|τ|.e-iΩτ dτ = . 4a2 + Ω 2 −∞

(3.18)

With a = 10 s-1 (representative for manual Morse telegraphy) the 3dB width is 3.2 Hz, the 40 dB width 318 Hz.

3.17 References and further reading 3.1 A. Papoulis: The Fourier Integral and its Applications, McGraw-Hill 1962. 3.2 A. Papoulis: Signal Analysis, MacGraw-Hill 1984. 3.3 D.C. Champeney: Fourier Transforms and their Physical Applications, Academic Press 1973. 3.4 A.V. Oppenheim, A.S. Willsky: Signals and Systems, Prentice-Hall 1983. 3.5 E.H. Dooijes, Syllabus Digitale Signaalverwerking, 1996.

49

4. Discrete-time signals and systems

4.1 Discrete-time signals A discrete-time signal is an ordered sequence of numbers {xn}. Each xn can be regarded as the value of a function x with integer argument n: xn = x[n]. (If we are dealing with functions of integer argument, we use square brackets for including the argument. Hence X(ω) and X[k] are different functions!). We use the phrase discrete time to make it clear that the independent variable is of discrete nature, not the function value x. Though the sequence {xn} does not necessarily result from sampling a continuous-time signal, we will often denote the individual members of the sequence as samples. Some special discrete-time functions are: the unit-impulse

0 for n ≠ 0 δ[n] = 1 for n = 0

the unit-step

0 for n

n=0

(a)

(b) n

Notice that δ[n] = u[n] - u[n-1], and u[n] =

∑ δ[m ] m =− ∞

Also important is the class of exponential functions x[n] = c.αn. Examples: α = 1, -1, 2, -2, 1/2, exp(iω). In the last case we can also write x[n] = cos(ωn) + i.sin(ωn).

50

n ->

x[n]

n->

(a)

α=1

x[n]

α

x[n]

α>1

(c)

n->

x[n] = cos ( 2π n /8)

x[n] = sin( 2π n /10)

n ->

n ->

(a)

(b)

Question A. Does a unique function x[n] exist for every ω ? Answer: no, because exp(iωn) = exp(i (ω + m.2π) n), m ∈ Z. Therefore we only have to pay attention to values of ω on an interval of length 2π, for instance (0,2π) or (-π,π). Question B. Is x[n] a periodic function? In that case, x[n] = x[n + N] would be true for all n and a certain value of N: exp(iωn) = exp(iω(n + N)) = exp(iωn).exp(iωN) The last expression equals the first one only if exp(iωN) = 1, and this is true only if ω k ωN = k.2π, or 2π = N , k ∈ Z. The answer to our question is therefore: yes, provided that ω/2π is a rational number. In that case the fundamental period of x[n] is N = k.2π/ω, provided that N and k have no factors in common. Now we can write down the set of all periodic exponential functions with period N as follows: 2π ϕk[n] = exp(iωkn) = exp(i.k. N .n), k ∈ Z The ωk's are multiples (harmonics) of the fundamental frequency 2π/N. However, ϕk = ϕk + m.N , m ∈ Z, which implies that there are only N different functions ϕk[n]. Therefore, we consider only those functions ϕk[n] indexed with k = 0, 1, ........, N-1.

51

For N = 8 this set is pictured in figure 4-1 below. Notice that each ϕk has both a real and an imaginary component. Im ϕk = sin (2πkn/8)

Re ϕk = cos (2πkn/8)

k=0

k=1

k=2

k=3

k=4

k=5

k=6

k=7

Figure 4.1: Basis functions for N = 8

52

4.2 Discrete Fourier Transform (DFT) Theorem: any function defined on N consecutive integers can be exactly written as x[n] =

N-1 X[k] ϕk[n]

(4.1a)

∑

k = 0

1 N-1 X[k] = N ∑ x[n] ϕk[-n]

(4.1b)

n = 0

with ϕk[n] = exp(i.k.(2π/N).n). This is the Discrete Fourier Transform (DFT), the discretetime analogue of the Fourier series expansion of a continuous-time function defined on a finite interval. An important difference is that, in the discrete case, an exact representation is obtained with a finite number of terms, whereas in the continuous case any finite number of terms provides only an approximation in the least-squares sense. The DFT is very important in practical applications; the Fast Fourier Transform is an efficient algorithm (invented by Cooley and Tukey in 1965) for computing the X[k] from the x[n] and vice versa. 4.3 Discrete-time Fourier transformation (DTFT) In a similar way as we arrived at the continuous-time Fourier transformation (CTFT), we can formulate the Fourier transformation for an infinite discrete-time sequence x[n]: 1 π x[n] = 2π ∫ X(ω)e+iωn dω , -∞< n 1. This means that a certain amount of correlation exists between subsequent data values. The entropy of the first value of our data sequence would be 8 bits (all 256 values being equally probable). However, the entropy of each next symbol is only 1.5 bits. So by encoding the difference, we shrink from 8 bits to 1.5 bits. This example also illustrates the danger of decorrelation: if one difference is in error then all subsequent reconstructed data values will contain this error. For this reason decorrelation is always applied over relatively short blocks of data. Each block starts afresh with a full representation of the corresponding data value. In this way the possibility of error propagation can be reduced. 5.2 The capacity of a transmission channel In chapter 4 we have shown that a band-limited signal, ranging in the frequency domain from 0 - W Hz can be exactly represented by 2W samples per second. Thus we can say that the number of degrees of freedom (d.o.f.) of a signal segment of duration T seconds is 2WT.

62

Notice that every d.o.f. should be associated with a continuously variable quantity. So it is not possible to derive from this how many bits of information C can be conveyed per second by this signal in practical cases; C would be an infinite number if each d.o.f. could take any value! Even the practical fact that the total power (energy per second) P of the signal should be restricted to some upper bound does not change this observation. To obtain a useful estimation of the capacity C we have to compare the signal power P to the power N of the noise which is unavoidably added tot the signal while it propagates through its channel. The following expression for the channel capacity is a key result of Shannon's information theory : C = W 2log

P + N N

(5.6)

under the assumption that we have thermal noise (i.e., white and Gaussian). Notice that P + N is the total power of the received signal. The derivation of this expression is based on the result [5.3] that the entropy of band- and time-limited thermal noise of power N is 2 log 2πeN per d.o.f. Because there are 2WT degrees of freedom, the entropy per second is W 2 log2πeN. The entropy of the received signal is maximal if the source signal is itself thermal noise. Since the capacity of the channel is defined as the maximal difference between received signal entropy and noise entropy we have C = W 2 log2πe(P + N) - W 2 log2πeN = W 2 log

P + N . N

For example, if we have a voice-quality telephone line, W = 2400 Hz with a signal-to-noise ratio SNR = 35 dB5. Then 1+ P/N = 3151, and the capacity is 2400 * 11.7 = 28000 bits per second - which is the maximal capacity of today's dial-up lines. Actually the SNR of 35 dB was obtained here by calculating backwards from the known maximum bit rate. In reality the SNR should be better than 35 dB to achieve this, because the above conditions for maximal capacity are seldomly satisfied. Evidently, these considerations don't tell us anything about the way we could in practice push this amount of bits through the line! In section 5.6 we will briefly treat a few ways to accomplish this. 5.3 Error-detecting and error-correcting codes In the previous section it was shown that a transmission channel (a general term for any medium where data are stored temporarely) has a limited capacity because of the unavoidable occurence of noise. This means, of course, that we can never have the guarantee that a stream of bits entering the channel will leave it unchanged: errors are always to be expected. For this reason coding systems have been developed that add redundancy to data instead of removing it. In practice we find always combinations of the two approaches: usually first gross redundancy 5

Notice that the number of decibels corresponding to a power (not amplitude) ratio P1/P2 is 10.10log(P1/P2).

63

is removed by the techniques discussed in the next chapter; then on a smaller scale (for instance for every group of four bits as will be discussed below) redundancy is introduced again. We now give a number of ways this is often done in practice. 5.3.1 Parity checking In simple short-distance transmission (for instance by the RS-232 protocol), to each group of seven bits representing an ASCII character another bit is added in such away (i.e. by adding either a 0 or a 1 bit) that the total number of 1 bits is always even6 . If on the receiving station an odd-parity byte is detected, one can be sure that an error has occurred. However it is not possible to deduce exactly which data bit was in error. One way to make this possible is the '4 out of 7' code, discussed in the next section. 5.3.2 Error correcting codes As an example of error correcting codes we discuss the (7,4) Hamming code [5.1]. We split the bit stream 0000100010100100 into groups of four bits: 0000 1000 1010 0100 and complete each group by three bits, b5, b6, b7 computed as follows: b5 = b2 + b3 + b4 mod 2; b6 = b1 + b3 + b4 mod 2; b7 = b1 + b2 + b4 mod 2. Thus our coded bitstream is 0000000 1000011 1010101 0100101. It is assumed that there is at most one bit in error in each group of 7. This assumption is usually justified - in dial-up lines the error rate is of order 1 in 105, in high-quality lines less than 1 in 1010. In the case of no error we should have b2+b3+b4+b5 = even; b1+b3+b4+b6 = even; b1+b2+b4+b7 = even whatever the data bits b1, b2 and b3 may be. Three parity checks must be performed by the receiver. If all three sums are found to be even there is no error; if just one sum is odd, the error is in the check bit in that sum. If exactly two sums fail, the error is in the data bit which is common to them, but not in the third. If all three sums fail, the error must be in b4. A more involved but related error checking technique is CRC (cyclic redundancy check). CRC provides better safety against error bursts. Error correcting codes are extensively used not only in data transmission systems, but also in computer memory, data storage media and in the audio compact disc. See for instance [5.4] for more detailed information.

6

In fact, most systems leave the freedom to select either even or odd parity checking, or to omit the parity checking.

64

5.4 Transmission of analog signals: amplitude modulation Amplitude modulation (AM) was for many years the most important technique for imposing audio signals on high-frequency (100 kHz - 150 GHz) sinusoidal carrier signals, in order to transmit them by radio. Exactly the same technique is used for transmitting many signals (for instance telephone-channels) simultaneously over a cable. The principle is simply to vary the amplitude A of the carrier as a function of the audio signal a(t): s(t) = S cos Ωct , with S = S0(1 + m.a(t)), S0 > 0, where the modulation depth parameter m is chosen so that m.a(t) doesn't exceed 1 in absolute value, thus ensuring that S never becomes negative. For convenience we suppose that a(t) is a simple sinusoidal tone: a(t) = A cos Ωmt. Then m must be chosen so that m.A