Textual Statistics of the Articles from The Sun and Daily Mail: the Average Word Length

Below is the description of how I received that average, median and mode. In order to find the average I first found the sum of all the letter frequencies (from column Fx). The sum shows me how many letters all the words in the sample container. Then I divided the sum by the total number of words in the sample, which is 100. Average = (4+20+63+88+90+24+56+48+9+10+55)/100 = 4.67. This number shows me that on average there are 4.67 letters in a word within the sample. In order to find the median I first checked the cumulative frequency column in the tally table.

Median is the number that splits the whole sample ranked in ascending order into 2 equal parts. The total cumulative frequency is 100. Then the half is 50. I need to find the word length that contains a cumulative frequency of 50. In the table below, I can see that 4-letter words have a cumulative frequency of 57, while 3-letter words have a cumulative frequency of 35. It means that if I range all of the words from the sample in ascending order of word lengths, the word in the middle will be a 4-letter word.

Therefore, the median of the sample is 4. In order to find the mode, I need to check which word length is most frequently found in the sample. I referred to column 2 (Frequency) in the table above and found that 4-letter words are most frequently found in the sample (their frequency of 22 times is the highest in the column). Therefore, the mode of the sample is 4.

I can conclude that there are quite some differences here. All mean, median and mode are slightly higher for “ Daily Mail” that for “ The Sun” . However, this difference is certainly not significant. In each article, there is a similar tendency of the high frequency of short-length words. The words up to 5 letters (including 5) account for 75% of all words for “ The Sun” and for 61% of all words for “ Daily Mail” . So based on this sample we can argue that the author of “ Daily Mail” article on average uses longer words than that in “ The Sun” .

• Pages: 11 (2750 words)
• Document Type: Essay
• Subject: Statistics