## A Statistical Analysis of New Born Baby Naming Trends In Relation To Disney Princess Movie Release Dates

## Introduction

Few cultural items have permeated so deeply into childhood has the Disney Princess. From Snow White to Moana, Disney Princesses seem ever present in the zeitgeist, occupying a role in our lives seemingly agnostic of the time and place the world is in.

From its start in the early 1920s as an animation studio, to its status today as a massive multimedia conglomerate spanning sports, entertainment, theme parks, and more, Disney has impacted popular culture in almost every way— In this paper, I will examine the relationship between Disney Princess names, and the names of newborn babies in the United States.

While there’s fair discussion on who is and who isn’t aDisney Princess, for this exploration, we’ll first consider those with semi-conventional names, who’s modern status comes from the Disney films of which they are a part, and whose movies were released in the last 50 years. That includes Ariel, Belle, Jasmine, Pocahontas, Mulan, Tiana, Rapunzel, Merida, Anna, Elsa, Moana. This means we are excluding Snow White, Cinderella, Aurora (Sleeping Beauty) and Tinkerbell.

## Methodology

We’ll be examining the baby naming tendencies over time with consideration to these key dates and landmarks associated with each princess.

Date | Movie Release | Associated Princess(es) |

November 13, 1989 | The Little Mermaid | Ariel |

November 22, 1991 | Beauty and the Beast | Belle |

November 25, 1992 | Aladdin | Jasmine |

June 23, 1995 | Pocahontas | Pocahontas |

June 19, 1998 | Mulan | Mulan |

December 11, 2009 | The Princess and the Frog | Tiana |

November 24, 2010 | Tangled | Rapunzel |

June 22, 2012 | Brave | Merida |

November 27, 2013 | Frozen | Anna & Elsa |

November 23, 2016 | Moana | Moana |

To refine my exploration I considered the question to examine to be: Does the release of a Disney princess movie change the naming rates of the respective princess’s name in a set number of years before, and after, the release year.

## Data

Data Exploration began with consideration to which data sets would provide me the necessary information in a form conducive to this kind of exploration. Many naming data sets only contained the 2000 most popular names in any given year, but eventually I was able to find the entire US Department of Social Security data set on baby names, including over 32,000 names in many years.

One challenge to noe is that with naming data, when a name has been given between 1 and 4 times in a given year, it’s abstracted to 0 to maintain anonymity. For most names, the counts were orders of magnitudes larger than 4, so this did not present an issue, but in some cases this certainly created uncertainty in data.

After converting the data into an accessible form, I then processed the data to give counts of each occurance of the names Ariel, Belle, Jasmine, Pocahontas, Mulan, Tiana, Rapunzel, Merida, Anna, Elsa, and Moana in each year of social security card data starting in 1979, and ending in 2017. On first glance, this gave some clear trends of names over time— some names didn’t event exist in the years preceding a movie release, and then became quite common. To visualize this information, I plotted name counts vs years for each princess name over the entire data set.

For most names— Ariel, Belle, Jasmine, Tiana, Anna and Elsa. This resulted in charts showing gradual changes in the name over time, with some showing moderate to steep peaks near the movie release date. These plots are shown below.

However, while the above plots are interesting in that a quick glimpse indicates ANOVA may give interesting results, there were a few other plots that actually stood out far more, and some that stood out so much I’ve decided to exclude their names from analysis.

The first interesting plots come by the name Merida (from Disney’s Brave, released in 2012) and the name Mulan (from Mulan, released in 1998)

If you look at the sections before and after the asterisk, we see Mulan and Merida were basically not actually names used until the movie. While the occurrences for mulan are small in general, an is a case where the 1-4 occurrences = 0 challenge may be causing a more drastic visual than reality, Merrida, however goes from having virtually no naming occurances to often over one hundred.

Some names also gave data that wasn’t well able to be processed, and therefore were removed from the study. These were the data sets associated with the names Pocahontas, Rapunzel, and Moana. For rapunzel and Pocahontas, this was simply because so few babies had ever been named rapunzel, that nearly all years were reported as zero. However, oddly, both names saww their first occurrences larger than 4 in 2016 and 2017 respectively, something for which great speculation would be required to explain.

The other name removed from the analysis due to data is Moana. While the plot clearly shows a large spike in names around the movie release time, the movie is so recent (released in 2016) that only one year of data exists for the years after it was released.

## Results

To analyze if the release of a Disney princess movie change the naming rates of the respective princess’s name in a set number of years before, and after, the release year, I examined for each name (Ariel, Belle, Jasmine, Tiana, Anna and Elsa) a pair of null and alternative hypotheses, performed 1-Way Analysis of Variation (ANOVA), and calculated descriptive statistics.

### The Little Mermaid (Ariel)

For analysis of Ariel, I selected the following hypotheses:

H0: There is not variation in names between the 10 years before and 10 years after

H1: There is variation in names between the 10 years before and 10 years after

The data is summarized by this table

Groups | Count | Sum | Average | Variance |

1979-1988 | 10 | 5222 | 522.2 | 61774.4 |

1989-1998 | 10 | 27588 | 2758.8 | 1492813.956 |

The duration (10 years) of each data group is due to the amount of data preceding ariel (the subset of this data began in 1979). We can see just from the data summary that ANOVA may provide interesting results.

Source of Variation | SS | df | MS | F | P-value | F crit |

Between Groups | 25011897.8 | 1 | 25011897.8 | 32.17816178 | 0.00002218743622 | 4.413873312 |

Within Groups | 13991295.2 | 18 | 777294.1778 | |||

Total | 39003193 | 19 |

From this we see an extremely low P-Value of 0.00002218743622 (alpha=.05), to which we reject the null hypothesis and say there is variation in occurances of the name Ariel between the 10 years before and 10 years after The Little Mermaid was released.

Belle (Beauty and the Beast)

For analysis of Belle, I selected the following hypotheses:

H0: There is not variation in names between the 12 years before and 12 years after

H1: There is variation in names between the 12 years before and 12 years after

The data is summarized by this table

Groups | Count | Sum | Average | Variance |

1979-1990 | 12 | 146 | 12.16666667 | 19.96969697 |

1991-2002 | 12 | 674 | 56.16666667 | 1021.606061 |

The duration (12 years) of each data group is again due to the amount of data preceding ariel (the subset of this data began in 1979). The ANOVA results again show interesting results.

Source of Variation | SS | df | MS | F | P-value | F crit |

Between Groups | 11616 | 1 | 11616 | 22.30466659 | 0.0001033025434 | 4.300949462 |

Within Groups | 11457.33333 | 22 | 520.7878788 | |||

Total | 23073.33333 | 23 |

From this we see an extremely low P-Value of 0.0001033025434 (alpha=.05), to which we reject the null hypothesis and say there is variation in occurances of the name Belle between the 12 years before and 12 years after Beauty and The Beast was released.

Jasmine (Aladdin)

For analysis of Jasmine, I selected the following hypotheses:

H0: There is not variation in names between the 13 years before and 13 years after

H1: There is variation in names between the 13 years before and 13 years after

The data is summarized by this table

Groups | Count | Sum | Average | Variance |

1978-1991 | 13 | 56546 | 4349.692308 | 15347612.73 |

1992-2005 | 13 | 126211 | 9708.538462 | 1374329.269 |

The duration (13 years) of each data group is again due to the amount of data preceding Aladdin’s release (the subset of this data began in 1979) and the amount of data after (the data ends in 2017). The ANOVA results again show interesting results.

Source of Variation | SS | df | MS | F | P-value | F crit |

Between Groups | 186662008.7 | 1 | 186662008.7 | 22.32539841 | 0.00008356544002 | 4.259677214 |

Within Groups | 200663304 | 24 | 8360971 | |||

Total | 387325312.7 | 25 |

From this we see an extremely low P-Value of 0.00008356544002 (alpha=.05), to which we reject the null hypothesis and say there is variation in occurances of the name Jasmine between the 13 years before and 13 years after Aladdin was released.

Tiana (The Princess and The Frog)

For analysis of Tiana, I selected the following hypotheses:

H0: There is not variation in names between the 10 years before and 10 years after

H1: There is variation in names between the 10 years before and 10 years after

The data is summarized by this table

Groups | Count | Sum | Average | Variance |

1998-2007 | 10 | 7837 | 783.7 | 21671.78889 |

2008-2017 | 10 | 5989 | 598.9 | 30309.43333 |

The duration (10 years) of each data group is again due to the amount of data preceding The Princess and The Frog’s release (the subset of this data began in 1979) and the amount of data after (the data ends in 2017). The ANOVA results again show interesting results.

Source of Variation | SS | df | MS | F | P-value | F crit |

Between Groups | 170755.2 | 1 | 170755.2 | 6.569880149 | 0.01955313901 | 4.413873312 |

Within Groups | 467831 | 18 | 25990.61111 | |||

Total | 638586.2 | 19 |

From this we see an extremely low P-Value of 0.01955313901 (alpha=.05), to which we reject the null hypothesis and say there is variation in occurances of the name Tiana between the 10 years before and 10 years after Aladdin was released.

Anna and Elsa (Frozen)

For analysis of Ana and Elsa, I selected the following hypotheses:

H0_A: There is not variation in names between the 5 years before and 5 years after

H1_A: There is variation in names between the 5 years before and 5 years after

H0_E: There is not variation in names between the 5 years before and 5 years after

H1_E: There is variation in names between the 5 years before and 5 years after

The data is summarized by these tables

Groups | Count | Sum | Average | Variance | |

Ana | 2008-2012 | 5 | 31664 | 6332.8 | 514477.7 |

2013-2017 | 5 | 25477 | 5095.4 | 213767.3 | |

Elsa | 2008-2012 | 5 | 2366 | 473.2 | 2890.7 |

2013-2017 | 5 | 3263 | 652.6 | 82040.8 |

The duration (5 years) of each data group is again due to the amount of data preceding the movie release (the subset of this data began in 1979), and the amount of data after (data ends in 2017). The ANOVA results again show interesting results.

Source of Variation | SS | df | MS | F | P-value | F crit | |

Ana | Between Groups | 3827896.9 | 1 | 3827896.9 | 10.51266236 | 0.01183635874 | 5.317655063 |

Within Groups | 2912980 | 8 | 364122.5 | ||||

Total | 6740876.9 | 9 | |||||

Elsa | Between Groups | 80460.9 | 1 | 80460.9 | 1.894724572 | 0.2059634674 | 5.317655063 |

Within Groups | 339726 | 8 | 42465.75 | ||||

Total | 420186.9 | 9 |

From this we see an extremely low P-Value for Ana of 0.01183635874 and of 0.2059634674 for elsa (alpha=.05), to which we reject both null hypotheses and say there is variation in occurrences both of the name Ana and the name Elsa between the 5 years before and 5 years after Frozen was released.

## Future Work

It’s clear the impact of Disney princess names on american population names is large, and the results of this paper’s analysis show future work in this space would be quite interesting, perhaps we'll quantifying the correlation, and other variables (other key dates, generational shifts, 9 month delays).

## Conclusion

As it stands, for every single Disney princess name examined, there is a correlation in movie release and a change in naming rates. While statistical analysis has not been done to inform what that correlation is, we do graphically see in most cases a notable increase in names after the associated movie is released. In some cases, even, names going from 1-4 occurances to over 100.

Sources

https://funmoneymom.com/Disney-Princess-list/

https://www.ssa.gov/oact/babynames/