Sunday, November 15, 2015

Testing VE-SA aspects in marriage couples Part 2

In the first part of this study we tested VE-SA aspects in marriage couples on a new set of data from Wezembeek, Belgium. It failed to confirm the hypotheses found in an earlier study. Some readers have observed that the test data collection was rather small compared to the first research. That is true, but there are many more marriage data waiting on the internet and I have now repeated the test on another set of data from Adegem, Belgium. This is a sample of 1800 marriages, so more than 3000 persons with birth and marriage dates.
The results are available as public google docs: spreadsheet4 and spreadsheet5.

Just like in the previous test the hypotheses are not confirmed. People with a VE-SA aspect do not have a bigger age difference with their partner and do not marry later.

Why does it keep failing in tests on new data? I took a closer look at the Gauquelin data that were used in the original study. It contains data for 39592 families with at least one child. There are no data for couples that remained without children. For a little over 20000 families Gauquelin had found the birth dates for both parents, and this are the data that have been in used in Tarvainen's paper. For another 18000 families Gauquelin found the birth date for only one of the parents, and these data have not been used in the paper. But the hypothesis for delay of the marriage can be tested on these 18000 parents as well, using the age at birth of first child as a proxy for the marriage date. I did that exercise and the result is available here: spreadsheet3.
Again it fails to confirm the hypothesis. Both the men and the women with a VE-SA aspect have the first child a little earlier than average (21 days earlier for women, 72 days earlier for the men) in this unused portion of the Gauquelin data, rather than the delay expected by the astrologers.

This puts even more question marks behind the idea of using birth date of the first child as a proxy for marriage date. The problem is this: we have 39592 families in Gauquelin's data but only 45807 children. In those days the average fertility rate in France was around 2.5 children per woman, and that included the women who remained childless. This implies that a lot of children are missing in Gauquelin's collection. With almost 40000 families we would expect more than 100000 children, so at least 60000 children are missing from the families in this collection. For 35000 families (almost 90% of the total) we have only one child in the Gauquelin records. Was it the first child, the second or the last? We can only say it was "a" child in the given family. If we have a woman who gave birth at age 33 then we can't tell if she married at age 20 or at age 30. And even if she married at age 30, it may have been her second marriage already. France had over 600000 "war widows" after WWI and most of them got re-married within years. With so many children missing from the collection and with no information on second or even third marriages it makes no sense to use the age at birth of "a" child as a proxy for first marriage date. And that's why the hypothesis fails when tested on data collections that do have the exact marriage dates.

A few things can be learned from this test:
* We have to be very careful when finding small effects that get a significant p value only when we test on a very big data set. There are all kind of small biases and unexpected irregularities in the data that can produce a significant p value in calculations.
* This is especially so when we have very incomplete data with entire years missing.
* Testing on a second and third data collection (if possible) is a good way to see if the given effects keep showing up or not. In about 90% of the cases this will fail to confirm the results from the first test. It is really difficult to find effects that keep showing up in subsequent tests, but confirmation in subsequent tests is what it takes to get stronger evidence.

Tuesday, November 10, 2015

Testing VE-SA aspects in marriage couples

Recently I was shown an astrological paper that tested marriage couples for VE-SA aspects on Gauquelin's collection of heriditary data: "Effects of Venus/Saturn aspects in marriages" (by Kyösti Tarvainen) (Correlation, Vol. 29(2), July 2014, pp 7-14)

The paper presents results with statistically significant p values. Does this mean the hypothesis is valid? Or has something else slipped in and created a small bias that produces this test result?
I noticed a few things:
1)VE-SA aspects (as defined in the paper) happen much more often in some years and are much more rare in other years. This chart shows the number of days with a VE-SA aspect by year for 1880-1920:



Periods of relatively high frequency of VE-SA aspects alternate with periods of much lower frequency. A person born between 1886-1894 (or 1903-1911) had about 30% more chance of having a VE-SA aspect than somebody who was born from 1895-1902. Could this affect the results in some way? We do know that France had a huge demographic shock in WWI, with millions dying and postponing marriage and/or child birth. See: http://economics.ucr.edu/seminars_colloquia/2013/economic_theory/Vandenbroucke%20%20corrected%20paper%20for%204%2015%2013%20seminar.pdf
This may have interfered with the changing frequency of VE-SA aspects in the portion of the population that was most hardly hit by WWI.

2)Gauquelin's hereditary data seem to have the birth data only for certain years and miss them almost completely for other years. This chart show the number child birth records in the collection by year:



There are 1660 birth records for 1913-14 and then almost none for 1915-18 (lost in war?). Then there are more than 3000 for 1919-20 and again almost none for 1921-22, followed by over 8000 for 1923-24. For 1929-30 there are more than 10000 births followed by less than 2000 in the next two years. This is clearly a very incomplete set of birth records that doesn't reflect the true demographic changes of those days. Doing astrological research on this set of child birth data is going to be fraught with problems. The paper uses birth of the first child as a proxy for the marriage date to test for marriage delays. This distribution of birth records makes it very unlikely that they are suitable to be used as a proxy for marriage date.

I communicated those concerns to the author of the paper, but he waved them away. Perhaps in astrology we do not doubt results as long as they support our theories? Anyway, the true test for any hypothesis is always in confirming it on other data collections. And those days plenty of marriage data can be pulled from the internet. I found an almost ready-to-use collection of marriage records, including birth dates as well as the marriage date, here: http://geneaknowhow.net/script/dewit/wezembeek-oppem-croon.htm and tested them for VE-SA aspects. Because the marriage dates are included we do not need to use birth of child as a proxy for marriage date. The results are available here as public google docs: spreadsheet1 and spreadsheet2.

The hypotheses are not confirmed. Persons with VE-SA aspects actually had a slightly smaller age difference and married a little bit quicker than average, but nothing statistically significant. Many more marriage records from Belgium can be pulled up here and here, but I will need to code a script to filter the data from them. That will be for another day.

Thursday, November 5, 2015

Rampage killings and the moon

A friend sent me a note by somebody who contended to have found a statistically significant connection between rampage killings and the full moon. It was based on 60 rampage killers from this wikipedia page:

https://en.wikipedia.org/wiki/List_of_rampage_killers

When I looked over there I found more than 1000 cases were listed. So why test only 60?
I collected all the data on wikipedia into excel, cleaned them out for cases that have no exact date and obtained a sample of 1155 cases.
Using swisseph in excel I could easily test them for proximity to full moon. Moon phase doesn't change quickly, so I set all charts to London noon GMT.

You can download my excel sheet here: https://drive.google.com/file/d/0B38XDYuyyRE_NTZkY05tODF3Znc/view?usp=sharing
(comes as a zip file containing the excel file and a txt file with instructions for use)

I didn't find a higher frequency of rampage killings near full moon, but I found something else with a p value of 0.000013 (binomial)
You can see it in this distribution chart (binned by 20 degree):


In the waning crescent phase of the moon (280 to 360) we have a steadily elevated frequency of rampage killings. In this phase of the moon rampage killings have happened more than 30% more frequently than in the rest of the lunar cycle. This is a high difference.
Normal expectation: 0.2222 (80/360)
Test cases:  1155
Number of successes(x): 318
Cumulative probability P(x >=318) = 0.0000137

That's more unusual than anything I have ever seen come out of a test of astrology, so I have been double checking over and over to find a mistake somewhere, but I can't. If you spot something then please let me know in the comments section.

Because a hypothesis cannot be verified on the same set of data that was used to derive it I will start to keep track of rampage killings as of today Nov 5, 2015. We will then see if this tendency keeps going.
I will be using the same criteria as in the wikipedia article:

"A rampage killer has been defined as follows:
A rampage involves the (attempted) killing of multiple persons least partly in public space by a single physically present perpetrator using (potentially) deadly weapons in a single event without any cooling-off period
This list should contain every case with at least one of the following features:
  • Rampage killings with 6 or more dead (excluding the perpetrator)
  • Rampage killings with at least 4 people killed and a double digit number of victims (dead plus injured)
  • Rampage killings with at least 12 victims (dead plus injured)"
 For each new case we add to the list we will use the date and set time to 12 noon London GMT, just like in the test sample. If the Moon is waning crescent (280-360 angle with the Sun, geocentric) then we have a match, otherwise not. If the hypothesis holds up then we should see more than 22.22% of matching cases (80/360). I will watch the wikipedia pages for new additions, but if you think some rampage killing has been forgotten then you are always welcome to let me know in the comments. If it fits the set criteria then we will add it.