Sunday, November 15, 2015

Testing VE-SA aspects in marriage couples Part 2

In the first part of this study we tested VE-SA aspects in marriage couples on a new set of data from Wezembeek, Belgium. It failed to confirm the hypotheses found in an earlier study. Some readers have observed that the test data collection was rather small compared to the first research. That is true, but there are many more marriage data waiting on the internet and I have now repeated the test on another set of data from Adegem, Belgium. This is a sample of 1800 marriages, so more than 3000 persons with birth and marriage dates.
The results are available as public google docs: spreadsheet4 and spreadsheet5.

Just like in the previous test the hypotheses are not confirmed. People with a VE-SA aspect do not have a bigger age difference with their partner and do not marry later.

Why does it keep failing in tests on new data? I took a closer look at the Gauquelin data that were used in the original study. It contains data for 39592 families with at least one child. There are no data for couples that remained without children. For a little over 20000 families Gauquelin had found the birth dates for both parents, and this are the data that have been in used in Tarvainen's paper. For another 18000 families Gauquelin found the birth date for only one of the parents, and these data have not been used in the paper. But the hypothesis for delay of the marriage can be tested on these 18000 parents as well, using the age at birth of first child as a proxy for the marriage date. I did that exercise and the result is available here: spreadsheet3.
Again it fails to confirm the hypothesis. Both the men and the women with a VE-SA aspect have the first child a little earlier than average (21 days earlier for women, 72 days earlier for the men) in this unused portion of the Gauquelin data, rather than the delay expected by the astrologers.

This puts even more question marks behind the idea of using birth date of the first child as a proxy for marriage date. The problem is this: we have 39592 families in Gauquelin's data but only 45807 children. In those days the average fertility rate in France was around 2.5 children per woman, and that included the women who remained childless. This implies that a lot of children are missing in Gauquelin's collection. With almost 40000 families we would expect more than 100000 children, so at least 60000 children are missing from the families in this collection. For 35000 families (almost 90% of the total) we have only one child in the Gauquelin records. Was it the first child, the second or the last? We can only say it was "a" child in the given family. If we have a woman who gave birth at age 33 then we can't tell if she married at age 20 or at age 30. And even if she married at age 30, it may have been her second marriage already. France had over 600000 "war widows" after WWI and most of them got re-married within years. With so many children missing from the collection and with no information on second or even third marriages it makes no sense to use the age at birth of "a" child as a proxy for first marriage date. And that's why the hypothesis fails when tested on data collections that do have the exact marriage dates.

A few things can be learned from this test:
* We have to be very careful when finding small effects that get a significant p value only when we test on a very big data set. There are all kind of small biases and unexpected irregularities in the data that can produce a significant p value in calculations.
* This is especially so when we have very incomplete data with entire years missing.
* Testing on a second and third data collection (if possible) is a good way to see if the given effects keep showing up or not. In about 90% of the cases this will fail to confirm the results from the first test. It is really difficult to find effects that keep showing up in subsequent tests, but confirmation in subsequent tests is what it takes to get stronger evidence.

No comments:

Post a Comment