I think I have a great example of where the true system is not accurately reflected by an generally understood statistical model.

Say we have a dessease among the populace where 1 in every 100.000 people IS sick. This is the true situation which in fact we do NOT know but try to discove using statistics.

In a lab we "proof" a test that is 95 % accurate in establishing "Sick" over "not sick" when testing a truly sick person, one that was initially diagnosed as possibly sick because of some sympthoms.

Now the government with its pseudo scientists decided to test the whole populace and see what they have on their hands.

A populace of 10 million people is all tested using this 95 % test. Say you are also tested and the test says "Sick" in your case.

Question : Do you now need to worry and take immediate treatment ?

What do you think ? Answer truthfully !


...


Proper developped probability math will show that there are only 100 truly sick people in the population of 10 million while the populace wide testing will earmark no less than 500.000 people as sick !

So 499.900 people who are tested sick are AREN'T truly sick ! They are just diagnosed the wrong way.

In effect you (as a diagnosed sick person) will have only a 100/499.900 = 0.02 % chance that you are TRULY sick and need to worry and get treatment.

In effect the government just wasted a whole lot of money to test 10 million people, creating upset over patently useless test results.

But how can this be. The test data that confirmed the 95% accuracy of the test itself was correctly developped and "data doesn't lie" ?


When wrongly applied the intepretation method of a truthful data set often DOES lie. Meaning that it gives a totally false picture of what is really going on. In this case the dataset used to proof the test isn't to blame but the processing and application method is.


Along the same lines one can compare the Yardstick and Measurement based rating systems. It then turns out that yardstick systems are more sensitive to flaws then the parameter-model driven systems like Texel where the develloped model is checked against the entire dataset for validity rather then created by them through some statistical processing.

So one must be really careful with too simple notions in the field of statistics and the processing of statistical data.

Wouter




Last edited by Wouter; 03/15/06 08:53 AM.

Wouter Hijink
Formula 16 NED 243 (one-off; homebuild)
The Netherlands