Apr 28, 2005 - Given this complexity, .... The musical score output format should produce good quality ... Although this project has achieved a substantial level of usefulness, ... discusses the short time Fourier transform (STFT) and compares it to the ..... This is particularly evident for the case of Flamenco music - where.
Project Paganini - Final Report Daniel Franklin (9573127)
April 28, 2005
Supervisor: Professor J. F. Chicharo Co-Supervisor: Professor C. D. Cook
Abstract Music is an extremely rich and complex signal. With just four consecutive single notes of equal duration, a classical guitar can produce nearly four and a half million different progressions. With the addition of chords and changes in duration, these few notes can produce an enormous number of variations. Given this complexity, it is interesting to ask the question “is it possible to automatically extract enough information from the audio signal to reconstruct the original score”. This project attempts to answer that question. Given a recording of some music, Paganini first performs a time-frequency analysis on the signal, effectively obtaining an envelope of the signal at each frequency where notes are expected to be found. These signals are then passed through a neural network which attempts to classify the data as specific notes and rests. The results of this process are written out as an score, where the score can be manually corrected if required. The design acknowledges some of the non-linear characteristics of the human ear and brain, and crudely attempts to model those characteristics. The software is substantially complete, and although unable to match the human ear, does a very respectable job of extracting some of the key features of music. The DSP front end and neural network system are robust and have been rigorously tested. Although duration information is calculated, it is not included in the score. Consequently, the score is primitive but readable.
Contents
List of Figures
List of Tables
Acknowledgements Many people have offered valuable advice and suggestions. They include Professor Joe Chicharo (my thesis supervisor), Dr Ian Burnett, Peter Vial, Dr Frank Stootman (UWS Macarthur), David Battle (ANSTO) and Bill Mitchell (my guitar teacher). Thanks should also go Donald Knuth and Leslie Lamport, who developed the wonderful TEX and LATEX typesetting packages, and Nicolo Paganini, whose exceptionally difficult (and immensely enjoyable) violin and guitar music inspired this project.
Chapter 1 Introduction Music is an extremely complex one-dimensional signal, generally regarded as pleasing to the human ear. It is the result of interactions between the musician or ensemble, the musical score the instrument and the environment (for example, the geometry of the surroundings). The resultant sound is affected by all of these factors, and no two performances are identical as a result of the complex nature of this interaction. To generate sounds which may be perceived as music, a performer reads the notation (which may be in such forms as Western musical script or tablature), adds his or her interpretation (which may include such things as dynamics and rubato) and expresses this via the instrument (which, through its own characteristics, further alters the signal). The listener then perceives the music after transmission of the sound through air and after processing by the human ear and brain. The perceived signal is therefore highly complex, and it is a remarkable fact that the human mind can accurately identify most of the features of the original score. For example, most people can very accurately tap out a rhythm after hearing only a few bars of music. This rhythm signal is one of the most important and distinctive features of a musical signal. The other main parameter which defines music is that of pitch. Although much more difficult for the untrained human ear and mind to describe quantitatively, the relative (as opposed to absolute) frequency of successive notes (and simultaneous notes combined to form a chord ) is relatively easy to reproduce. This means that one can transpose a piece of music between keys (altering the frequency of all notes by a fixed ratio) without greatly affecting the way the music sounds. A trained musician can, with careful practice, reproduce a passage of music on an instrument never having seen the score - ‘by ear’. The musician will listen to short fragments of a recorded passage and attempt to replicate this on his or her instrument, repeating the process until it sounds the same. Some particularly gifted
musicians, such as the remarkable Nicolo Paganini after whom this project is named, could reproduce an entire piece of music on an entirely different instrument, after having heard it only once. Project Paganini broadly aims to automate this process - that is, given a fragment of music from a recording (confined to a solo classical guitar at present) it should be possible to run it through a suite of analysis tools and generate a reasonably accurate printed score.
Chapter 2 Project Specifications 2.1
Objectives and Scope
It is essential to reduce the scope of this project to manageable proportions. To achieve this objective, a series of restrictions on the types of recordings to be used has been imposed. Specifically: • The recording must be of a solo instrument; • Only one type of instrument will be analysed here, although it is desirable to make the process sufficiently general to allow the use of other instruments as well; • The input recording will be a 44.1 ksample/sec mono recording taken from a CD source; and
• The recording should be reasonably free of noise and frequency-altering distortions such as wow and flutter.
As the author is a classical guitarist, and has access to a wide range of quality recordings of this music, this is the target instrument which has been chosen. Additionally, the author has access to two guitars with quite distinct sound quality, so short test samples and neural network training data such as known notes and chords can be created. Additional desirable features of the software include: • Sufficient accuracy for this software to be practically useful; • Practical speed of execution - 100% accuracy will be useless if it takes weeks to analyse a single piece of music;
• Minimal memory use - it should be possible to process the music block by block; and
• The musical score output format should produce good quality engraver-standard notation, and it should be possible to edit the score by hand or insert it into another score.
Chapter 3 Summary 3.1
Current Status
At the moment Paganini can perform the following functions: • Extract signals showing the behaviour of the music at different frequencies over time;
• Make reasonably accurate decisions about the onset and duration of notes based on the above signals;
• Write this out in the form of an file. The DSP front-end of Paganini has been completed and works very well. It is fairly slow, taking around forty-five seconds on a Cyrix 686 P150+ machine (which has a relatively slow floating point unit) to analyse a five second fragment of music. This is acceptable, as the data array is currently stored in a less than optimal manner (favouring simplicity over speed). The decision-making code performs reasonably well on a large class of music, although it still has a number of limitations, such as an inadequate ability to cope with chords, and the variable quality of example data used to train the system. The output of this stage can be used to generate a rudimentary score. It has no timing information other than the notes being placed in the correct order. A simple duration quantiser could be added fairly easily, as a number of free software implementations of this sort of thing exist (in various MIDI-score conversion programs) so it would not be difficult to graft one of these tools here. None of these packages performs particularly well, but they would provide at least some roughly accurate quantised duration information, provided that the tempo of the music remains approximately constant.
In addition to the main program itself, a number of data processing and visualisation tools have also been written using the Paganini library. These are briefly described in Appendix ??.
3.2
Possible Future Enhancements
Although this project has achieved a substantial level of usefulness, a number of further enhancements are possible. These include: • Automated fine-tuning of the fundamental frequencies. This would make the whole process more robust as most recordings are made on instruments which
are slightly out of tune. See ?? for an example of the improvements which this would allow; • Generation of useful duration information. See [?] and [?] for examples as to how this can be done;
• Accuracy improvements - better neural network training; • Chord grouping decision-making code; • Better (more informative) typesetting output including dynamics notation; • Speed improvements; and • Code cleanups.
Chapter 4 Literature and Software Review Unfortunately, DSP applied to music analysis - as opposed to the compression of music, speech or general audio - seems to be a fairly obscure topic in terms of the number of papers published on the subject. A number have been found which have been of use, relating to various aspects of the project.
4.1
Music Analysis Papers
The idea of using wavelets to analyse music seems to be relatively popular, with a number of papers and web pages exploring the idea. L. Smith, in his paper “Listening to Musical Rhythms with Progressive Wavelets” [?] explored the idea of using wavelet analysis to extract information about the rhythm signal of a piece of music from MIDI files (although it appears that the techniques he develops may be useful for a more general signal). Essentially, in [?], a technique of rhythm analysis is presented which is based on a complex Gabor mother wavelet function: g(t) = e−t
2 /2
e2πjω0 t
(4.1)
which is essentially a decaying sinusoid. This signal is chosen as it seems to roughly model the shape of the musical signals which Smith investigates. According to Smith, this function does not actually form an orthogonal basis, but does preserve phase information about the signal - which is principally what Smith uses in his analysis. His input signal is a highly stylized form of music a MIDI signal. MIDI files essentially contain information about the intention of the performer (the note they play, when it was played, and with what intensity). His objective, then, is to attempt to identify features such as the time signature based on the DWT of the signal. His examples clearly illustrate the effectiveness of
this technique on fairly simple fragments of music, and on purely artificial rhythmic patterns. Smith’s paper is interesting, however it understandably takes a fairly simplistic view of music. Simply considering inter-onset-interval (IOI) (time between the commencement of adjacent notes) and amplitude is of no help when dealing with contrapuntal music (such as music by Bach) where there are two or more quite independent rhythms being played on the same instrument. Unfortunately, this is probably the best kind of music on which to use the analysis technique outlined in Chapter ?? as its rhythm tends to be highly deterministic. However, in the absence of any superior technique, this may be the best system to use for determining time signatures in Project Paganini. This will be left for future work.
4.2
Web Pages Related to the STFT and Wavelets
There are many web pages containing useful information about DSP and wavelets in particular. Robi Polikar’s excellent wavelet tutorial web page [?], has a good general discussion on the history, applications and mathematics of wavelets. His is one of the few web pages to have extensive diagrams and images of wavelets - it also appears unique in discussing the continuous wavelet transform (CWT) in some depth. He discusses the short time Fourier transform (STFT) and compares it to the CWT and discrete wavelet transform (DWT). This is useful, as the STFT is essentially what is being used here (except that a multi-resolution version is used). Some of the shortcomings of the different approaches are discussed, and the author makes a strong case for consideration of the use of wavelets in signal analysis applications of a multi-resolution nature.
4.3
Strang’s Discussion of Fourier and Wavelet Representations of Signals
In an appendix to an article appearing in American Scientist [?] Strang presents an informal discussion of the nature of music and talks about the various methods of representing signals compactly. He suggests that the short-time Fourier transform has some limitations, in particular he says that abrupt discontinuities in images or sounds are very difficult to analyse with the DFT. Instead, he proposes a wavelet approach, which would more accurately encode the intention of the composer and the nuances of the performer. In this paper, however, he does not suggest using wavelets as a means of feature extraction - rather as a technique for the compression
of a signal such as music, speech or video. His frequent use of a musical analogy is interesting, however, and may suggest that there is some value in this approach. He states that a Fourier transform is not in itself a very appropriate method for the analysis of a signal which varies in both in time and over a range of scales. Strang goes into the mathematics of wavelets, using the Haar wavelet as an example, in a fairly high-level and very readable manner.
4.4
Software
Some packages used (and of potential use) in the development of Paganini are listed below. GNU Lilypond Lilypond [?] is a powerful music typesetting engine which, like Paganini, produces TEX code as its output. It takes its input from a description file which is much more user-friendly and flexible than raw TEX or . It is still under heavy development, although the current development version is quite usable. It also supports conversion of MIDI files into its native ‘mudela’ file format and from mudela back to MIDI. This area of code could be very useful to Paganini (particularly for generating timing and rhythm information). It may be possible to support the mudela output file format, as the sources to Lilypond are freely available. Rosegarden Rosegarden [?] is a somewhat more primitive typesetting package than Lilypond, however it does have an easy-to-use GUI. At the moment it is rather limited in its functionality but it may be a good front-end to allow manual correction of scores generated with Paganini. As sources for Rosegarden are freely available, supporting its file format should not be too difficult. Rosegarden also has a MIDI to Rosegarden converter - this may be useful in the same way as Lilypond’s mi2mu package. Mixviews Mixviews [?] is not a music package per se, rather a general-purpose DSP tool. It features a GUI and allows for such operations as moving-window FFTs, LPC analysis, up- and down-sampling, various forms of digital filtering, reversal, DC adjustment/removal and so on. This has mainly been used for data visualisation purposes and for the recording of the neural network training data. GNU Octave Octave [?] is a free Matlab clone for Unix (and Linux). It is mostly Matlab compatible, and has many of the same functions available (including FFTs and some other DSP and statistical m-files). It is useful where data needs to be more
directly manipulated, and it is a good system for prototyping code and testing ideas. It allows easy reading and writing of plain-text data files. Most of the examples in [?] run well (some require minor tweaking) under Octave. Octave can be used as a shell for automatic data processing. Gnuplot Gnuplot [?] is a powerful free data visualisation package for Unix (and Linux) and is very convenient where plotting of data or functions is desired without the overhead of going through Octave (which uses Gnuplot for function and data plotting anyway). Like Octave, Gnuplot can also be invoked as a shell (for use in scripts) to produce automated plots.
Chapter 5 Fundamentals of Music and Music Notation Before the algorithms can be discussed, it is useful to briefly discuss the principles of music and music notation, and how they relate to the project. The human ear and brain analyse frequencies in a logarithmic fashion. That is, a listener would consider the intervals between a sequence of notes with frequencies doubling each time to be the same. The Western musical scale is based upon this principle, with the distance between 2N and 2N +1 divided into seven intervals. The entire scale, including the two octave notes, thus contains eight notes. This constitutes the Major Diatonic Scale, corresponding to the white keys of a piano [?]. Table ?? shows the bottom set of eight frequencies used in the software. The Ratio column shows the ratio between a note and its next highest neighbor [?]. Letter name D E F G A B C D
Fundamental Frequency (Hz) Ratio 10 74.250 9 16 82.500 15 9 88.000 8 10 99.000 9 9 110.00 8 16 123.75 15 9 132.00 8 10 148.50 9
Table 5.1: Standard major diatonic scale frequencies used by Paganini The notes separated by 9 : 8 constitute a major tone, 10 : 9 a minor tone, and 16 : 15 a semitone. Additionally, between the tone intervals (both major and minor) a further five semitone intervals are inserted: D] (E[), F ] (G[), G] (A[), A] (B[) and C] (D[). These notes constitute the black keys of a piano. Often (as with the modern piano tuning standard) the geometric mean of the upper and
lower note is used to determine the frequency (rather than 15 : 16 or 16 : 15) - the √ so-called even-temperament tuning (giving a ratio of 1 : 12 2, or about 1 : 1.0595 instead of 15 : 16 or 1 : 1.0667). This ensures that all semitone intervals are the same, which is essential where a very wide range of frequencies can be produced simultaneously. For orchestras and many solo instruments, the tuning which gives perfect ratios (thirds and fifths) is preferred as this is generally thought to sound better than the even-temperament scale. However, this is not possible on a fretted instrument, so the guitar is tuned with an even-tempered system. A full chromatic scale (12 intervals) as used by Project Paganini is shown in Figure ??. Note that the sharpened notes could have equally correctly been shown as the flats of the next highest note. ’D
0 D0
E 0F
0F 0
0
G G aa bcc d
Figure 5.1: Lowest octave of a chromatic scale, as used by Paganini The twelve frequencies from D to C] are stored in a header file, and the frequencies of the 46 notes playable on the guitar (from a low D on the lowest bass string to the top B on the highest string) are calculated by multiplying these fundamental frequencies by one, two, four or eight (neglecting the notes given by index 47 and 48). The key is usually explicitly stated in the form of a key signature, indicating which notes are played sharp or flat. Moreover, it indicates which scale the passage is based upon. This can be always be overridden by adding a sharp, flat, or natural indicator next to the note. This is called an accidental. A passage of music, then, may be in the key of D Major, in which case the F s and Cs are, by default, played sharp (one semitone above the normal position unless explicitly stated with accidental notation). The same key signature is also used for a minor key (B minor). The rules for determining the actual key are quite wellestablished, and are difficult to describe without actually demonstrating the actual sound of the resultant music. For the purposes of this project, it is sufficient to say that major keys tend to be cheerful, minor keys melancholy. When played on an even-tempered instrument, flat keys (major or minor with flats in the key signature) tend to sound dull, while sharp keys tend to sound bright. The harmonic content of a particular note is one of its most important characterising features. On a piano, most notes are almost pure sinusoids, modulated by a well-defined envelope. On a guitar, however, there is always a significant amount of energy present at the higher harmonics. This depends upon where on its length the string is plucked - according to Taylor [?] when plucked halfway down, most (more
than 80%) of the energy is concentrated in the fundamental, however, moving down to one-fifth of the length of the string, about half of the energy is in the second harmonic. In fact, at one fortieth of the length of the string, only around 5% of the energy is in the fundamental, with the bulk of the energy in the third, fourth and fifth harmonics. This is one way in which a guitarist controls the so-called tone colour of the music. Unfortunately, the presence of variable levels of higher order harmonics complicate the analysis process considerably. The flexibility of the back-propagation neural network is used to overcome this complication. As it is quite rare for the third harmonic or above to be present in significant levels, only the problem of the second harmonic is considered. In this case, the second harmonic also has the same letter value as the fundamental - but separated by one octave. The precise details of which octave notes are played in a given chord are often left to the performer, for example, in E Major, up to three Es may be played, but the loss or addition of one of these Es has an insignificant effect on the overall sound. Therefore, if an extra E is detected, it will be a very minor difference and is considered to be an acceptable error. Should a higher order harmonic also be detected and classified as a note, it would form a major third or major fifth with the fundamental, which should not fundamentally alter the generated sound (although it would be preferable for this to be ignored by the software). Timing is also well defined, with notes consisting of fractions of a semibreve or whole note. They are given the names of minim (half note), crotchet (quarter), quaver (eighth), semiquaver (sixteenth), semidemiquaver (thirty-second), hemisemidemiquaver (sixty-fourth), and so on. A note can also be extended by 50% by adding a dot, or 75% by adding a double dot. Each possible note duration also has a corresponding rest - period of silence in one or all of the voices. Like notes, rests can also be dotted (extended by 50% or 75%). The time signature, such as 43 , refers to the rhythm - in this case, three beats, each a crotchet (quarter note). The notes in a bar must sum to the key signature value (here they should sum to 3/4 of a semibreve). The first beat in a bar is usually played more strongly. The actual rate at which these notes are played is often indicated in crotchets / minute.
Chapter 6 Algorithm 6.1
Overview
The algorithm consists of a number of processes which are performed on the input signal, each extracting some information to be used in subsequent stages. It proceeds as follows: 1. Extraction of long-term amplitude envelope data (note that at present this is not used - however, it has been put in place to allow for the addition of dynamics notation in the score in the future); 2. Extraction of a separate magnitude signal for each frequency at which notes could be found; 3. Individual note signals are processed with the neural network, and the output of this stage is stored in an intermediate data structure; 4. This data structure is passed through a simple hysteresis thresholder, with the resultant notes being stored (with their onset and duration) in another data structure; and 5. This data structure is converted to code for output.
6.2
Extraction of Dynamic Characteristics
The first pass over the raw sampled data is a very simple estimate of the amplitude of the signal as it changes over time. It is performed with the C++ function arrayclass runmax (arrayclass input, unsigned long length) which takes the data by length samples at a time, finds the maximum, repeats this value length
times in the output arrayclass and processes the next chunk until the end is reached. Thus, a fairly rough but representative envelope of the input signal is generated. It is worth noting at this point that several other techniques have been tested for finding the envelope of the signal, such as rectification and filtering as in an AM receiver, but as the chosen method was the least computationally-intensive, it has been used for this stage of the process. The coarseness of this envelope-detection is easily specified, so both long-term and short-term amplitude variations could be extracted. Long-term variations will be used to determine dynamics notation (indicated on the score as a stretched less than (>) and greater than ( other_filename.tex where options is one or more of -w -W -v -V -h -H -o -O -b -B --fix --utp --ltp
--warranty --version -? --help --options --verbose (double) (double) (double)
Display warranty (i.e. no warranty) Show version number Give brief usage information + version number Give brief usage information Show more possibly useful information Fine tune fundamental frequencies (e.g. 1.005) Specify upper trigger point (between 0 and 1) Specify lower trigger point (between 0 and 1)
Note: utp be greater than ltp. Figure B.1: Paganini usage information It is recommended that Paganini be run with the –verbose option, which provides some information as the program is running (to stderr, i.e. even if the output is redirected to a file, information will printed out on the terminal. As described in the built-in usage information, the program is executed, redirecting output to a file. This file is a LATEX file, and may be processed and viewed as follows: % paganini --verbose --utp 0.85 --ltp 0.7 sevilla.wav > sevilla.tex % tex sevilla.tex % xdvi sevilla.dvi Of course, must be correctly installed for this to work. To add manual corrections, the LATEX file may be edited by hand, after which it must be run through LATEX and xdvi again.
Appendix C List of all programs (including scripts) The paths are relative to the top-level directory of the package. • src/paganini The main program. • src/create training data file This program takes the file specified on the command line and generates a sparse DFT of it (i.e. it evaluates the DFT at the 46 frequencies of interest). The magnitudes are written to standard output. • src/process training data This script uses various other programs to generate a complete normalised list of training data. It takes no command-line arguments. • src/train N This program performs N training iterations, each iteration presenting M randomly selected examples to the network (M = number of .dat files). See Section ?? for more details. In addition to the ear.nnw file, train also generates a text list (datalist) of all of the .dat files it finds. • src/maxamp -m|(-t|-f) -1|-2 file This program scans through file and writes the largest value to stdout. The first argument specifies whether the magnitude or the time (or frequency, the arguments are equivalent) is of interest, and the second argument specifies the number of columns in the file. This can be used on both the spectrum files
generated by create training data and files which include frequency or time information in the first column as well. It is a general-purpose data-extraction tool. • src/agb alpha beta This returns 1 if alpha is greater than beta, and zero otherwise. Alpha and beta can be floats (unlike with the regular Unix expr command). • src/norm file normfactor This program uses Equation ?? to warp the values stored in file. normfactor should be the maximum amplitude over all .dat files. • src/playall This script simply plays all the .wav files in the current directory. • progs/dumpdata file.wav > file.dat This program is a generic hack which shows how to use libpaganini’s arrayclass, audioclass and other features to dump some feature of an audio file to standard output. Currently it is hacked to dump the envelope of the sound file. The real programs are dumpfreq, dumpraw, and dumptime. • progs/dumpfreq file.wav > file.freq This program dumps a the first 1 kHz of a magnitude spectrum from the specified audio file to standard output. The output consists of two columns, the first containing the frequencies, the second containing magnitudes. In gnuplot, use the commands gnuplot> set data style lines gnuplot> plot "file.freq" to see the spectrum. This is useful for measuring peak frequencies. • progs/dumptime file.wav > file.time This program simply dumps the data from a .wav file to standard output, with actual time from the start of the file in the first column and the sample value in the second column. • progs/dumpraw file.wav > file.raw This program dumps an FFT-based time-frequency plot (currently using code from [?]) to standard output in an intermediate file format suitable for use
with raw2pgm. The coarseness is hard-coded, and re-compilation is required to change the default behaviour. • progs/raw2pgm file.raw > file.pgm This program reads the .raw file generated by dumpraw and dumps a binary .pgm file file to standard output. Note: as the output is binary, if standard output is not redirected, the terminal display could become corrupted. To fix this on a Linux system, type reset.
Appendix D List of Classes and their Member Functions Three principal classes have been written: a flexible array structure, a structure intended to be associated with an audio file, and a neural network. They are each described in turn.
D.1
arrayclass
This class is used for the storage of data. Internally, it is a dynamically allocated array of type complex . Supported member functions include overloaded [], +, =, ==, !=, ! and * operators which perform subscript array accesses with bounds checking, concatenation, assignment, equality and inequality tests, test for empty and element-by-element multiplication. Other functions supported include: • complex dfti (long i) Finds the ith DFT coefficient of the array; • complex dfti (long i, long start, long finish) Finds the ith dft coefficient of the subset of the arrayclass extending from start to finish.; • void dump (float dest []) Dumps the arrayclass’ contents into dest; • arrayclass env (int all peaks) Interpolates between peaks, effectively finding the envelope;
• long getsize () Returns the size of the arrayclass; • void load (float source [], long length) Loads the contents of source from 1 to length into the arrayclass; • arrayclass mag () Returns an arrayclass of the magnitudes; • double max () Returns the largest magnitude; • double maxnorm () Returns the largest squared magnitude; • void normalise () Scales the data so the largest magnitude is unity; • void print (int what) Prints the contents of the arrayclass, the argument is one of COMPLEX, MAG, ARG, REAL or IMAG; • void print (int what, long start, long finish) Prints the arrayclass from start to finish, ”what” is as for the other print function; • int resize (long size) Truncates or zero-pads the arrayclass to the new size; • void rmdc () Calculates the average (DC comp.) and subtracts it from all elements’ real component (really should only be used for pure reals!); • void selfconj () Negates the imaginary components of all elements; • arrayclass subset (long start, long finish) Returns a subset of the arrayclass from start to finish (both of which may be negative or out-of-bounds, in which case the returned arrayclass will be zero padded as required;
• void threshold (long start, long finish, float cutoff) Sets all values with magnitudes less than the given cutoff (percentage of the maximum) to zero; • arrayclass wsubset (long start, long finish) Windowed subset. Currently uses triangular window (for speed).
D.2
audioclass
This class essentially controls access to a .wav file. Features include automatic header checking and extraction (allowing arbitrary conditions to be placed on the type of file used) and the ability to extract a given chunk of the file into an arrayclass (useful for tight memory situations where the bulk of the file can remain on the disk). • string get filename () Returns the filename associated with the audioclass; • int good () Returns true if the file was successfully opened; • arrayclass get slab (long start, long finish) Grabs a chunk of the wavfile, between the specified indices; • long get length () Returns the length of the data section of the wave-file, in samples; • long get samplerate () Returns the sample rate at which the sound file was recorded; • string get error () Gets an error message, ”no errors” if OK; • short get errorno () Gets an error number; • void print () Straight dump of the wavfile, saves having to load into memory.
D.3
nnwork
The neural network class is the most basic form of back-propagation neural network. The class nnwork has three constructors: nnwork brain (); nnwork brain ("filename"); nnwork brain (10, 7, 3); The first creates an empty network, the second loads the neural network from a file, and the last creates a network with 10 inputs, 3 outputs and 7 hidden layer nodes, with initially random weights distributed between +/- 0.5. Member functions include: • train (float data [], float desired [], float max MSE, float eta) which is used to train the network. data [] is the input data, and desired [] is the desired output. Both of these parameters must match (or exceed) the dimensions of the network. max MSE is the maximum allowable mean squared error while eta is the learning rate (which probably should be between 0.3 and 0.05). • run (float data [], float result []) which simply runs data [] through the network, placing the results in result []. Again, these must be of the appropriate size. • int load (char *filename) This function reads the network weights from the file with the specified filename. If a failure occurs for whatever reason, it returns 0, otherwise it returns 1. • save (char *filename) This function saves the network weights to the file with the specified filename. If a failure occurs for whatever reason, it returns 0, otherwise it returns 1. • int get layersize (int layer) This function returns the number of nodes in the specified layer. layer may take on the values ALL, INPUT, HIDDEN and OUTPUT, with the ALL option providing the total number of nodes in the network (a convenient check to see if the network is empty or not).
Appendix E List of Functions Many potentially useful DSP functions have been written and are stored in dsp.cc (part of libpaganini). In each case, they have been designed with correctness in mind, as opposed to speed. They include: • arrayclass corr (arrayclass data2, arrayclass data1) Correlates two arrayclasses; • arrayclass fft (arrayclass data, int direction) Finds the FFT of an arrayclass; • arrayclass hilbert (arrayclass data) Finds the Hilbert transform of an arrayclass; • arrayclass fftconv (arrayclass arg1, arrayclass arg2) Performs a convolution using FFTs; • arrayclass avgfilt (arrayclass data, long length) Painfully slow moving average filter; • arrayclass runmax (arrayclass data, long length) See the description in Chapter ??; • arrayclass pdft (arrayclass data, long start, long finish) Calculates part of the DFT of data (from start to finish).