Another tiresome monologue about the trials of digital life
...................................................................................
Title Slide
Would anybody notice if testers did something else in life ? Les Hatton Professor of Forensic Software Engineering, CISM, Kingston University
[email protected] Version 1.1: 02/Dec/2010
Overview
Part 1: Current state of the art in systems Part 2: A cloud of defects and a silver-lining
2
Continuing challenges •
Poor quality complex infrastructure
•
Systems of sub-systems
•
Jargon, box-ticking and engineering esteem
•
Too many Klingon interfaces
•
Generally sloppy systems quality
•
Testing has too narrow a focus
3
14-May-2010 Some software is exceedingly good … (Space Shuttle software)
Images copyright NASA and USAF, Space Shuttle Atlantis 14-May-2010
Me
4
… but most of it is not (all within 90 minutes at Heathrow, 11-May-2010) Check -in
Departures
“This system is rubbish” (departures official)
Departures lounge
On the plane
5
Systems of sub-systems Speaking of check-in at Heathrow … • • •
Feb 2010, print off boarding cards online Departures won’t let me in because they can’t read it. SAS can’t issue another boarding card because I already have one
• •
I generously offer to lie on the runway until they sort it out. SAS duty manager gives me written one if I promise “I am me”. I point out that its lucky I’m not Bertrand Russell. This comment is wasted. Note to self: no more jokes at check-in
• •
Departures won’t let me in because I now have two passes until they find person who refused the first one.
6
The strange case of the FAX machine and the telephone sex line • • • • • • • •
Receive telephone bill for FAX machine containing two 20m05s calls to a telephone sex line at 4am. I call BT. “Computer says” its correct I ‘elevate priority’ and provide FAX logs. BT says it will investigate Final demand received I call BT. “Computer says” its correct Notice of removal of service arrives I call BT and invite them to take me to court BT calls me. “Computer says” its correct but they will let me off “if I don’t do it again”
6 weeks
7
Jargon, box-ticking and engineering esteem •
When is a requirement not a requirement ? “The system shall engage all stake-holders consensually whilst pushing back the envelope in a timely and budget aware fashion.”
•
Advice to testers Avoid speaking to this person. They are Klingon. Say HISlaH* and leave quickly. *Allegedly ‘Yes’ in Klingon. http://www.kli.org/tlh/phrases.html 8
Result: We are still screwing big projects up as if there is no tomorrow • •
NHS “Connecting for Health” – budget over 16 billion pounds Child Support Agency
• • •
Passport Office Benefit Office C-Nomis (2009) (Ministry of Justice and Home Office) – “Nobody sure how 161 million pounds had been spent”
•
Transport Direct cycle route planner (2009). This absorbed 2.7 million pounds but failed to replace a public site with far greater coverage and functionality (cyclestreets.net) which cost around 6,000 pounds. 9
More seriously: The Nimrod explosion, Afghanistan 2006 Daily Mail, 29/Oct/2009
MOD Head of Air: “Complete failure to do his job in relation to the Nimrod Safety Case” MOD Safety Manager “Most of the time he was clearly out of his depth” BAE Systems Chief Airworthiness Engineer “Pushed through too quickly because he was too ambitious and assumed it was safe anyway” BAE Flight Systems and Avionics manager “Significant responsibility for poor planning, poor management and poor execution of project” QINETIQ Task manager of Safety Case project “agreeing on behalf of Qinetiq to sign off project without seeing or reading the reports” QINETIQ Technical Assurance Manager for Nimrod Safety Management “Guilty of allowing the manifestly inappropriate BAE reports to be approved”
Note the extensive safety bureaucracy and titles 10
The Nimrod explosion, Afghanistan 2006
“The Nimrod Safety Case was a lamentable job from start to finish. It was riddled with errors, it missed key dangers, its production is a story of incompetence, complacency and cynicism.” Charles Haddon-Cave QC
11
Honesty is the best policy
12
Klingon interfaces 2010 – The occasional heating system
13
Klingon interfaces 2010 – The occasional heating system • • • •
One frequency and a range of 100m … No diagnostics. That is “No” as in “NO”, Nada, nowt. Relies on the “Gurgle Protocol”. Exotic reboot procedure Two sets of batteries, one for display and one to control wireless. No indication of battery state.
• •
No override. That is “No” as in “NO”, Nada, nowt. Only breaks down in cold weather
•
Designed by Klingons for Klingons 14
More interfaces from hell: Credit card PIN readers
Consider this from the point of view of the blind
15
Sloppy systems: Why I still hate digital television • •
2007 Freeview box 1: Download yes / no ? 2008 Freeview box 2: Crashed every 3 hours
• •
2009 Freeview box 3: Crashed every 7 hours 2010 Freeview box 4: Crashes every 4 hours (I don’t care any more)
•
Note Hatton’s Law: Television quality is inversely proportional to the number of channels on offer. 16
Sloppy systems: Vodafone’s attempt to upgrade my Mobile WiFi connection
17
Sloppy systems: Disconnecting for Health 04/01/2010 •
Following failure of CRS, your scribe discovers new algorithm to find patient in NHS hospital
11/08/2008 •
Failures in new NHS computer system have meant hundreds of suspected cancer sufferers in London have their operations cancelled.
•
People in contact with MRSA could not be contacted. Many appointments lost. http://www.bbc.co.uk/1/hi/england/7555077.htm 18
Algorithm to find patient in NHS hospital using Connecting for Health CRS
• • •
Go to top floor 1: Ask duty ward nurse 2: If heard of patient, visit and go to 4
• •
3: If floors below, descend one floor and go to 1 4: Leave
19
So: are there enough of us ? • • • • • • • •
A good developer produces around 20,000 LOC per year of released tested code. About 50% of the budget is testing We can then assume a good tester can test around 20,000 LOC per year. 2,500 people are registered with SIGIST. Assuming everybody is out there testing, our test capacity is perhaps 50 MLOC per year for the UK. Based on relevant populations, the test capacity of the world is about 1 BLOC per year. There are hundreds of BLOC, (e.g. Wikipedia and other web sources) I rest my case.
20
A summary of threats to testing •
We should be testing systems as well as subsystems
•
We are being swamped in useless jargon and technologies Test what you can build. Build what you can test. Your job is to protect the end-user from Klingon management and developers.
•
There appears to be almost no testing involvement in human interface design
•
There is still little management awareness of the real cost of systems failure. 21
A need to expand testing
•
Many modern systems are subject to attack
•
System testing rarely includes an explicit brief to assess security awareness
•
This is not a new discipline, it is simply an extension of the standard role of testing.
•
You need to be aware of the following … 22
Botnets The software you are testing may be on a compromised machine
Thousands of PCs and PDAs controlled by remote criminal operators
Mariposa botnet (Spain, Mar 2010) had details of 800,000 people gleaned from 12.7 million machines. Waledac (US, Feb 2010) (hundreds of thousands of PCs) used for sending hundreds of millions of spam messages each day. Lethic (Jan, 2010), Mega-D (Nov 2009), Torpig (May, 2009), McColo (Nov, 2008) all taken down after efforts by security reseachers.
About every 3 weeks the PC of a personal acquaintance is compromised. 23
People will lose data from your system. Can you make it harder ?
Organisations have an appalling record …
Mar 2010, Barnet council lost 9,000 children’s records Aug 2008, Zurich insurance lose about 600,000 records, (came to light in Mar, 2010). Jan, 2010, Ladbrokes lose 4.5 million customer records. Oct 2008, 25 million child benefit records lost, (this included 350 on witness protection scheme). 2005-2008, HMRC reported 7 other significant data loss incidents. Dec 2007, hundreds of thousands of patient records from nine NHS trusts went missing. The NHS also routinely shares patient data with local councils. In 2008, the MoD lost almost as many records as the NHS. Dec 2007, Department of Transport admits three major data loss incidents including those of 3 million learner drivers. Aug 2008, HSBC lost an entire server with 159,000 customer records.
24
Overview
Part 1: Current state of the art in systems Part 2: A cloud of defects and a silver-lining
25
And now for something completely different …
Is it possible to uncover fundamental principles ?
For example, defects are reported to cluster
Is there a general mechanism for this ? Is it exploitable in testing ?
26
And now for something completely different …
Some thoughts about building systems What is scale-free behaviour ? Scale-free behaviour in component size Is scale-law behaviour persistent in software systems ? A tentative unifying principle Conclusions 27
Building systems
When we build a system we are making choices
Choices on functionality Choices on architecture Choices on programming language(s)
There is a general theory of choice – Shannon information theory.
28
Building systems
Some notes on Shannon information theory
Based on assembling messages from non-divisible pieces, such as bits. Suppose we have an N-bit stream of 1s and 0s. There are 2N possible ways of arranging these. Shannon was interested in representing the complexity or information content of these and realised following Hartley in 1928 that log2 (2N) = N was related to the number of choices need to find a particular combination 29
Building systems
Shannon information and decisions
…
… …
…
30
Building systems
Software component size - approximate
Number of lines of code. This is quite dependent on the programming language, (consider the influence of the pre-processor in C and C++ for example).
Software component size - better
Based on tokens of a programming language.
31
Building systems from tiny pieces
Tokens of language
Fixed tokens. You have no choice in these. There are 49 operators and 32 keywords in ISO C90. Examples include the following in C, (but also in C++, PHP, Java, Perl …): { } [ ] ( ) if while * + *= == // / , ; : Variable tokens. You can choose these. Examples include:identifier names, constants, strings
Every computer program is made up of combinations of these, (note also the BoehmJacopini theorem (1966)). 32
What is scale-free behaviour ?
In this context, scale-free behaviour refers to a phenomenon whose frequency of occurrence is given by a power-law. Consider word-counting in a document. If n is the total number of words in a document and ni is the number of occurrences of word i, then it is observed (originally by Zipf (1949)), that for many texts,
c fi = p i
where c, p are constants and
ni fi ≡ n 33
What is scale-free behaviour ? Re-writing as
nc ni = p i
This is usually shown as
ln ni = ln(nc) − p ln i So we are looking for … ln ni
ln i
34
Examples from the real world
Physics:- specific heat of spin glasses at low temperature, Caudron et al (1981) Biology: Protein family and fold occurrence in genomes, Qian et al. (2001) Biology: Evolutionary models, Fenser et al (2005) Economics: Income distributions, Rawlings et al (2004) Software systems: incoming and outgoing references and class sizes in OO systems, Potanin et al (2002) Many excellent examples in Newman (2006) Studies of C systems also reveal scale-free behaviour (Jones)
http://www.knosof.co.uk/cbook/cbook.html 35
Application to software systems
Smoothed (cdf) data for 21 systems, C, Tcl/Tk and Fortran, combining 603,559 lines of code distributed across 6,803 components, Hatton (2009), IEEE TSE. 36
A model for emergent power-law size behaviour using Shannon entropy Suppose component i in a software system has ti tokens in all constructed from an alphabet of ai unique tokens. First we note that
a i = a f + a v (i )
Fixed tokens of a language, { } [ ] ; while …
Variable tokens, (id names and constants) 37
A model for emergent power-law size behaviour using Shannon entropy An example from C:
Fixed (18)
void int ( ) [ ] { , ; for = >= -- -
+ Variable (8)
bubble a N i j t 1 2
void bubble( int a[], int N) { int i, j, t; for( i = N; i >= 1; i--) { for( j = 2; j a[j] ) { t = a[j-1]; a[j-1] = a[j]; a[j] = t; } } } }
Total (94)
38
A model for emergent power-law size behaviour using Shannon entropy For an alphabet ai the Hartley-Shannon information content density I’i per token of component i is defined by ti
I i log(ai ai ...ai ) log(a i ) = = log(ai ) I 'i ≡ = ti ti ti We think of I’i as fixed by the nature of the algorithm we are implementing. 39
Consider now building a system as follows Consider a general software system of T tokens divided into M pieces each with ti tokens, each piece having an externally imposed information content density property I’i associated with it. 1
2 3 …. ti,I’i
M
T = ∑ ti i =1
… M 40
General mathematical treatment
It turns out (see appendix) that the most likely solution is Probability of component of size ti tokens appearing
pi ~ (ai )
−β Unique tokens used to produce the ith component.
This states that in any software system, conservation of size and information (i.e. choice) is overwhelmingly likely to produce a power-law alphabet distribution. (Think ergodic here). 41
One last little bit of maths
Note that for small components, the fixed token overhead is a much bigger proportion of all tokens, af >>av(i), so ⎛ av (i ) ⎞ 1 −β −β (a f + av (i)) ≈ (a f ) ⎜⎜1 + ⎟⎟ pi = Q(β ) af ⎠ ⎝
−β
≈ (a f )
−β
Constant
For large components, the general rule takes over
pi ~ (ai )
−β 42
Application to software systems So we are looking for the following signature
(a )
−β
f
log pi
pi ~ (ai )
−β
log i 43
Application to software systems
Smoothed (cdf) data for 21 systems, C, Tcl/Tk and Fortran, combining 603,559 lines of code distributed across 6,803 components, Hatton (2009), IEEE TSE. 44
Application to software systems
Six randomly chosen individual systems 45
So what does this have to do with defects ?
Suppose there is a constant probability P of making a mistake on any token. The total number of defects is then given by di = P.ti Then 1 (ai )− β ≈ (ti )− β ≈ (d i )− β pi = Q(β )
This step uses Zipf’s law, Hatton (2009)
So defects will also be distributed according to a power-law.
46
Defect clustering in the NAG Fortran library (over 25 years) Defects components
XLOC
0
2865
179947
1
530
47669
2
129
14963
3
82
13220
4
31
5084
5
10
1195
6
4
1153
7
3
1025
>7
5
1867 47
Conditional probability of finding defects*
* See, Hopkins and Hatton (2008), http://www.leshatton.org/NAG01_01-08.html
48
Some random thoughts about defects
I think it is useless to ask the question why so many components exhibit zero defect. It is a statistical side-effect and as useless as asking why somebody has won the lottery. Note that defect clustering strongly implies that if you find a defect, you should carry on looking as there is an increased probability of finding more independently of how the system was built, what it does or what language was used.
49
Some random thoughts about Testing
We need to keep to the fundamental principles of testing and avoid or clarify the jargon We need to be involved in more than just testing sub-systems. Too many interfaces are depressingly poor. Testing is there to represent the end-user’s interests as well. There are not enough of us and progress has been disappointing in the last 20 years.
50
References
My writing site:http://www.leshatton.org/ Specifically, http://www.leshatton.org/variations_2010.html Thanks for your attention.
51
Appendix: General mathematical treatment W=
The number of ways of organising this is:-
T! t1!t 2 !...t M !
Stirling’s approximation + logs as usual gives:M
ln W = T ln T − ∑ ti ln (ti ) i =1
In physical systems, we seek to find the most likely arrangement by maximising this subject to two constraints M
T = ∑ ti i =1
⎛ Ii ⎞ and I ≡ ∑ I i = ∑ ti I 'i ≡ ∑ ti ⎜⎜ ⎟⎟ i =1 i =1 i =1 ⎝ t i ⎠ M
M
M
Assume fixed externally so not varied
52
General mathematical treatment
Using Lagrange multipliers and setting
δ (ln W ) = 0
leads to the most likely distribution being given by
ti e − β I 'i pi ≡ = M T − β I 'i e ∑ i =1
where pi can be considered the probability of piece i occurring with a share ei of U. β is a constant. 53
General mathematical treatment
To summarise, for large T and ti The most likely distribution of the I’i subject to the constraints of T and I held constant M
T = ∑ ti is
and
M
I = ∑ ti I 'i i =1
i =1
ti e − β I 'i pi ≡ = M T − β I 'i e ∑ i =1
54