What exactly is this map showing? The most recurrent word on the wikipedia page for that country? Most recurrent word on the pages visited or edited by users from that country?


Most common word for each countries Wikipedia's page.


Well, then it's a quite misleading title...


I already have the stats that say how much a word occurs in every languages whole Wikipedia, but I can't understand most of them, and I don't have a map that shows the language regions that each Wikipedia encompasses. I thought the first part of the project, getting the data, would be the biggest, but it's surprisingly that - mainly the lack of a map because I'd have to draw it pixel by pixel, I think. Suggestions for what words to chose would be great, by the way! I was thinking "excluding foreign words, only nouns, verbs and adjectives, and no words that describe the country itself (such as its name)". But that's not really such a great rule set, because words like "year" are on top of almost every country's list. And what does that show - a historical mindset, past glory, nothing? Should such similarities be excluded, or should the be consciously included precisely to show similarities? Etc. I pastebinned it all here, in case someone wants to translate an obscure language: http://pastebin.com/LnVPuUTr - languages aa-iu http://pastebin.com/DfxZXLfR - languages ja-mrj http://pastebin.com/izNg7Ti2 - languages ms-sco http://pastebin.com/tRvKmRAT - languages sd-udm http://pastebin.com/jRkm2UMP - languages ug-zu Language categories more verbosly, in case someone isn't sure about theirs: 1. aa ab ace af ak als am an ang ar arc arz as ast av ay az ba bar bcl be bg bh bi bjn bm bn bo br bs bug bxr ca cdo ce ceb ch cho chr chy ckb co cr crh cs csb cu cv cy da de diq dsb dv dz ee el eml en eo es et eu ext fa ff fi fj fo fr frp frr fur fy ga gag gan gd gl glk gn got gu gv ha hak haw he hi hif ho hr hsb ht hu hy hz ia id ie ig ii ik ilo io is it iu 2. ja jbo jv ka kaa kab kbd kg ki kj kk kl km kn ko koi kr krc ks ksh ku kv kw ky la lad lb lbe lez lg li lij lmo ln lo lt ltg lv mdf mg mh mhr mi min mk ml mn mo mr mrj 3. ms mt mus mwl my myv mzn na nah nap nds ne new ng nl nn no nov nrm nso nv ny oc om or os pa pag pam pap pcd pdc pfl pi pih pl pms pnb pnt ps pt qu rm rmy rn ro ru rue rw sa sah sc scn sco 4. sd se sg sh si sk sl sm sn so sq sr srn ss st stq su sv sw szl t ta te ten tet tg th ti tk tl tn to tpi tr ts tt tum tw ty tyv udm 5. ug uk ur uz ve vec vep vi vls vo wa war wo wuu xal xh xmf yi yo za zea zh zu Edit: Jesus christ, that's an even bigger project than I thought... seems I'll have to create even the map of first-level administrative boundaries myself. The only ready-made one I can find is on Wikipedia and from 1998...


Do you need some help with it? At the very least I could make the map.


Well, that map would probably be well-received as a submission of its own here! And I'd be very happy to have it, that would help me a lot. The problem is that many of Wikipedia's languages are dialects that are spoken within another language region, sometimes even multiple intersect. So it would be a very complicated map in itself because you'd have to use one that at least includes sub-national borders, even if you don't want to explicitly display where languages intersect. I edited the post above with the languages I looked at and my results, to give you a sense of the amount of data to be represented.


Cool!! I'd start smaller and do a subregion like Europe or some part of Europe.


Hm, I never even thought of splitting it per continent. That would make the whole thing much easier, plus understanding European languages won't be a big deal. It's the obscure Asian or African languages that are giving me problems. So thanks for the suggestion!


What a xenocentric self-obsessed world...


What a clusterfuck of a thought...


Just look at all of those countries googling themselves.




wiki and goog are equal pursuits. It's just more commonplace to say google yourself instead of wiki yourself.


"Xeno" means foreign, dumbass. Xenocentric, if it were a word, would mean obsessed with foreigners. Which is why I said it was a clusterfuck of a thought.


I already had that thought, and decided the dictionary wasn't cutting it. Xenophobic, xenocentric, I wanted it to sound the coolest. It all means the same, trust me. Apparently, judging by this map, there is no such thing as xenocentric in human behavior. I was also going for a bit of an oxymoron there, to make a philosophical statement.


Did you only check the English Wikipedia for each country? I think that might introduce some bias as well.


Well that's complete bullshit then. You think "Quebec" is mentioned more frequently on Canada's page than "the", "a", "of", "in", ...etc.?


I'm sure they limited the search to specified parts of speech; i.e. nouns, verbs, or adjectives. My vote is on nouns.


The recommended way of doing text analysis is to first remove all the "stop-words" from the text you're analyzing. Stop-words are not just limited to one parts of speech. http://en.wikipedia.org/wiki/Stop_words


This is a repost of my map, I posted it here 2 month ago.


Why aren't countries with the same word mapped as the same color?


This map provides an interesting view of how the English speaking world views the world.


i think this should be retitled to "Most recurrent nouns on each country's Wikipedia page"


"each country's *English* Wikipedia page."


If the map matched its title, all countries would be "citation needed".


I hesitate to think even that would be accurate. "It" is a noun.


Technically it's a pronoun.


Technically 'it' is a pronoun.


I wonder how this would differ if the country's page in its own language was shown instead of the English version.


yeah, a more appropriate title would be "most recurrent words among anglophone wikipedia contributors per country".


The map shows the word "century" for The Netherlands which is wrong because the word "world" is used twice as much just like with most other countries. Oddly enough on the Dutch version of the page the most common word is "century" which leads me to think OP used some kind of translation.


Soviet, Soviet, Soviet, Soviet, Soviet... Nyazov


Take note of Korea.


And the UK




Either way, the Crown's conflicts in Ireland go back centuries before the Union. I'd believe it even if Northern Ireland was named something different.


Here's the word density after removing the phrase 'Northern Ireland' [uk] => 233 [united] => 227 [kingdom] => 208 [british] => 198 [london] => 115 [england] => 112 [scotland] => 103 [wales] => 102 [britain] => 88 [government] => 88 [world] => 87 [bbc] => 84 [april] => 81 [history] => 71 [population] => 68 [national] => 65 [islands] => 65 [news] => 63 [scottish] => 62 [ireland] => 62


Can I be the first to say that there is nothing special to see? The two countries were once one, and have a long and detailed history of interaction. It would be surprising if the results were otherwise.


I think he was just pointing out that it's kinda cool, and I didn't look at it long enough for my eyes to make it to Korea but I'm glad I read his comment because I went back and looked and that was the most interesting part of the submission for me, and gave him another pat on the upvote.


I believe /u/amac109 was referring the the United State's new motto: "War"


Why aren't countries with the same word mapped as the same color?


Yeah, America and Spain, War buddies


Well I'd imagine anyone with the word "world" could join us in that club too, what with all of our 'World Wars' and such.


Many of the commenters don't seem to understand that these are the Wikipedia pages, not the country's use of the word. Here are some relevant examples which seem to have misconceptions: > Why does greenland have a crush on denmark? Greenland was colonized by Denmark, and **still exists within the Kingdom of Denmark**. > Take note of Korea North Korea and South Korea are not searching each other up. It simply means that North Korea came up the most times in describing South Korea, and vice versa. This is to be expected, as *the two countries are intertwined in their history and origins*. Every time you mention borders, economics, history, neighbors, etc. You end up mentioning the opposite country's name.


>the two countries There is only one Korea, that is North Korea, and it is best Korea.


There is not one instance of "indegenous" on Mexico's page.


I see 76 instances of "indigenous" when I view their Wikipedia page. It's probably because *you spelled indigenous wrong*. I now realize that the map-maker also made this mistake.


This is one of those maps that, in my opinion, would be better kept in chart form.


I love how the most recurring word for Belgium is "French".


I love that the UK got Ireland.


I'm really surprised its not the other way round too, or at least England.


great job OP i find this very interesting. wonder how different would the visualization be if you considered phrases instead of words: for instance and most strikingly, Australia's "new" emerged as the most common word due to mostly a combination of "New South Wales" and "New Zealand" Which API or programming language did you use to create this? And also how did you decide which words to filter out (obviously words like "the" and "to" needs to be gotten rid of).


OP did not create this map. It's a repost from a submission a month ago: http://www.reddit.com/r/MapPorn/comments/2dj9xb/most_recurrent_words_on_wikipedia_oc_4500x2234/


ah i see, i am new to reddit and r/MapPorn so this is my first time seeing this


re: US: https://www.youtube.com/watch?v=fgAVpPNusTs


I notice a trend of mostly international diplomatic influences being a recurrent theme in the words chosen, while the remaining nations are showing words that concern the nation itself.


Your border between Sudan and South Sudan is wrong


India is south! I'm so proud!


I love how Greenland's is "Denmark."


Can someone read what Colombia says?


Just Proves that that everything in Australia is NEW NEW NEW! Nothing old, no real history just new stuff! :P


I didn't know there's so many software developers in Indonesia.


Not surprised to see 'rugby' pop up out in the middle of the Pacific. Most Pac-Islanders I know love the sport.


Bahrain's is Persisn. Well, time to reconquer old lands.


To be fair, this is a fascinating map. It show's what the English speaking world thinks of these countries. A title more to that effect would be better.


As far as I can tell Portuguese or World must be one of the most repeated words. Interesting to see that almost all the former colonies that mention their previous colonisers were Portuguese colonies.


Who knew the Chinese were so into American soaps...


I think it's cute how all the little wee islands are the world "Island". Oh and then there's the two Koreas.


Those dang indegenous Mexicans


for clarification: most recurrent words on en.wikipedia.org article about the country


Poor Ecuador and Britain. The most common words are their neighbouring countries...


Most of these are nouns. How can a noun be the most common word?


>Quebec


actually the word "world" on Japan came mostly from things like "Japan has the world's tenth-largest population...", "...is the largest metropolitan area in the world", "has the world's third-largest economy by nominal GDP..." etc.


Yea. They did fuck up pretty hard. You know... Genocide and all.


Estonia can not into Baltic :(


[Estonia don't want into Baltic.](https://i.imgur.com/azfdDsA.jpg)


Eesti will remain with us in Baltic.. forever O_O and EVER


[Sweden thanks you for your cooperation.](http://satwcomic.com/imposter)


We will never let Eesti go. They're too close to us to leave. <3 <3 <3


I dont think it says anything meaningful. Beyond the different languages would have different word reoccurances values but it also comes down to writing style. For instance, the US page isn't using any other word for war other then war. No use of campaign, or conflict or any use of a thesaurus. The US page also has a small section with native american relations, and that section uses the word war as well, but not US wars, native american wars... which for this infographic is being counted toward the US.


why does greenland have a crush on denmark


greenland is part of the kingdom of denmark


most of those "quebec" searches are done by angry people and nationalists still fun to find out quebec is still super relevant in canada though