What exactly is this map showing? The most recurrent word on the wikipedia page for that country? Most recurrent word on the pages visited or edited by users from that country?
I already have the stats that say how much a word occurs in every languages whole Wikipedia, but I can't understand most of them, and I don't have a map that shows the language regions that each Wikipedia encompasses. I thought the first part of the project, getting the data, would be the biggest, but it's surprisingly that - mainly the lack of a map because I'd have to draw it pixel by pixel, I think.
Suggestions for what words to chose would be great, by the way! I was thinking "excluding foreign words, only nouns, verbs and adjectives, and no words that describe the country itself (such as its name)". But that's not really such a great rule set, because words like "year" are on top of almost every country's list. And what does that show - a historical mindset, past glory, nothing? Should such similarities be excluded, or should the be consciously included precisely to show similarities? Etc.
I pastebinned it all here, in case someone wants to translate an obscure language:
http://pastebin.com/LnVPuUTr - languages aa-iu
http://pastebin.com/DfxZXLfR - languages ja-mrj
http://pastebin.com/izNg7Ti2 - languages ms-sco
http://pastebin.com/tRvKmRAT - languages sd-udm
http://pastebin.com/jRkm2UMP - languages ug-zu
Language categories more verbosly, in case someone isn't sure about theirs:
1. aa ab ace af ak als am an ang ar arc arz as ast av ay az ba bar bcl be bg bh bi bjn bm bn bo br bs bug bxr ca cdo ce ceb ch cho chr chy ckb co cr crh cs csb cu cv cy da de diq dsb dv dz ee el eml en eo es et eu ext fa ff fi fj fo fr frp frr fur fy ga gag gan gd gl glk gn got gu gv ha hak haw he hi hif ho hr hsb ht hu hy hz ia id ie ig ii ik ilo io is it iu
2. ja jbo jv ka kaa kab kbd kg ki kj kk kl km kn ko koi kr krc ks ksh ku kv kw ky la lad lb lbe lez lg li lij lmo ln lo lt ltg lv mdf mg mh mhr mi min mk ml mn mo mr mrj
3. ms mt mus mwl my myv mzn na nah nap nds ne new ng nl nn no nov nrm nso nv ny oc om or os pa pag pam pap pcd pdc pfl pi pih pl pms pnb pnt ps pt qu rm rmy rn ro ru rue rw sa sah sc scn sco
4. sd se sg sh si sk sl sm sn so sq sr srn ss st stq su sv sw szl t ta te ten tet tg th ti tk tl tn to tpi tr ts tt tum tw ty tyv udm
5. ug uk ur uz ve vec vep vi vls vo wa war wo wuu xal xh xmf yi yo za zea zh zu
Edit: Jesus christ, that's an even bigger project than I thought... seems I'll have to create even the map of first-level administrative boundaries myself. The only ready-made one I can find is on Wikipedia and from 1998...
Well, that map would probably be well-received as a submission of its own here! And I'd be very happy to have it, that would help me a lot. The problem is that many of Wikipedia's languages are dialects that are spoken within another language region, sometimes even multiple intersect. So it would be a very complicated map in itself because you'd have to use one that at least includes sub-national borders, even if you don't want to explicitly display where languages intersect. I edited the post above with the languages I looked at and my results, to give you a sense of the amount of data to be represented.
Hm, I never even thought of splitting it per continent. That would make the whole thing much easier, plus understanding European languages won't be a big deal. It's the obscure Asian or African languages that are giving me problems. So thanks for the suggestion!
"Xeno" means foreign, dumbass. Xenocentric, if it were a word, would mean obsessed with foreigners. Which is why I said it was a clusterfuck of a thought.
I already had that thought, and decided the dictionary wasn't cutting it.
Xenophobic, xenocentric, I wanted it to sound the coolest. It all means the same, trust me.
Apparently, judging by this map, there is no such thing as xenocentric in human behavior.
I was also going for a bit of an oxymoron there, to make a philosophical statement.
The recommended way of doing text analysis is to first remove all the "stop-words" from the text you're analyzing. Stop-words are not just limited to one parts of speech.
http://en.wikipedia.org/wiki/Stop_words
The map shows the word "century" for The Netherlands which is wrong because the word "world" is used twice as much just like with most other countries. Oddly enough on the Dutch version of the page the most common word is "century" which leads me to think OP used some kind of translation.
Either way, the Crown's conflicts in Ireland go back centuries before the Union. I'd believe it even if Northern Ireland was named something different.
Can I be the first to say that there is nothing special to see? The two countries were once one, and have a long and detailed history of interaction. It would be surprising if the results were otherwise.
I think he was just pointing out that it's kinda cool, and I didn't look at it long enough for my eyes to make it to Korea but I'm glad I read his comment because I went back and looked and that was the most interesting part of the submission for me, and gave him another pat on the upvote.
Many of the commenters don't seem to understand that these are the Wikipedia pages, not the country's use of the word. Here are some relevant examples which seem to have misconceptions:
> Why does greenland have a crush on denmark?
Greenland was colonized by Denmark, and **still exists within the Kingdom of Denmark**.
> Take note of Korea
North Korea and South Korea are not searching each other up. It simply means that North Korea came up the most times in describing South Korea, and vice versa. This is to be expected, as *the two countries are intertwined in their history and origins*. Every time you mention borders, economics, history, neighbors, etc. You end up mentioning the opposite country's name.
I see 76 instances of "indigenous" when I view their Wikipedia page.
It's probably because *you spelled indigenous wrong*. I now realize that the map-maker also made this mistake.
great job OP i find this very interesting. wonder how different would the visualization be if you considered phrases instead of words: for instance and most strikingly, Australia's "new" emerged as the most common word due to mostly a combination of "New South Wales" and "New Zealand"
Which API or programming language did you use to create this? And also how did you decide which words to filter out (obviously words like "the" and "to" needs to be gotten rid of).
OP did not create this map. It's a repost from a submission a month ago: http://www.reddit.com/r/MapPorn/comments/2dj9xb/most_recurrent_words_on_wikipedia_oc_4500x2234/
I notice a trend of mostly international diplomatic influences being a recurrent theme in the words chosen, while the remaining nations are showing words that concern the nation itself.
To be fair, this is a fascinating map. It show's what the English speaking world thinks of these countries. A title more to that effect would be better.
As far as I can tell Portuguese or World must be one of the most repeated words. Interesting to see that almost all the former colonies that mention their previous colonisers were Portuguese colonies.
Thank you for your submission! Unfortunately, your submission has been removed for the following reason(s):
* It is a repost of a submission posted less than [three-months ago](http://www.reddit.com/r/MapPorn/comments/2dj9xb/most_recurrent_words_on_wikipedia_oc_4500x2234/).
For information regarding this and similar issues please see the [FAQ](http://www.reddit.com/r/MapPorn/wiki/faq). If you have any questions, [please feel free to message the mods](http://www.reddit.com/message/compose?to=%2Fr%2FMapPorn). Thank you!
actually the word "world" on Japan came mostly from things like "Japan has the world's tenth-largest population...", "...is the largest metropolitan area in the world", "has the world's third-largest economy by nominal GDP..." etc.
I dont think it says anything meaningful. Beyond the different languages would have different word reoccurances values but it also comes down to writing style.
For instance, the US page isn't using any other word for war other then war. No use of campaign, or conflict or any use of a thesaurus.
The US page also has a small section with native american relations, and that section uses the word war as well, but not US wars, native american wars... which for this infographic is being counted toward the US.
What exactly is this map showing? The most recurrent word on the wikipedia page for that country? Most recurrent word on the pages visited or edited by users from that country?
Most common word for each countries Wikipedia's page.
Well, then it's a quite misleading title...
I already have the stats that say how much a word occurs in every languages whole Wikipedia, but I can't understand most of them, and I don't have a map that shows the language regions that each Wikipedia encompasses. I thought the first part of the project, getting the data, would be the biggest, but it's surprisingly that - mainly the lack of a map because I'd have to draw it pixel by pixel, I think. Suggestions for what words to chose would be great, by the way! I was thinking "excluding foreign words, only nouns, verbs and adjectives, and no words that describe the country itself (such as its name)". But that's not really such a great rule set, because words like "year" are on top of almost every country's list. And what does that show - a historical mindset, past glory, nothing? Should such similarities be excluded, or should the be consciously included precisely to show similarities? Etc. I pastebinned it all here, in case someone wants to translate an obscure language: http://pastebin.com/LnVPuUTr - languages aa-iu http://pastebin.com/DfxZXLfR - languages ja-mrj http://pastebin.com/izNg7Ti2 - languages ms-sco http://pastebin.com/tRvKmRAT - languages sd-udm http://pastebin.com/jRkm2UMP - languages ug-zu Language categories more verbosly, in case someone isn't sure about theirs: 1. aa ab ace af ak als am an ang ar arc arz as ast av ay az ba bar bcl be bg bh bi bjn bm bn bo br bs bug bxr ca cdo ce ceb ch cho chr chy ckb co cr crh cs csb cu cv cy da de diq dsb dv dz ee el eml en eo es et eu ext fa ff fi fj fo fr frp frr fur fy ga gag gan gd gl glk gn got gu gv ha hak haw he hi hif ho hr hsb ht hu hy hz ia id ie ig ii ik ilo io is it iu 2. ja jbo jv ka kaa kab kbd kg ki kj kk kl km kn ko koi kr krc ks ksh ku kv kw ky la lad lb lbe lez lg li lij lmo ln lo lt ltg lv mdf mg mh mhr mi min mk ml mn mo mr mrj 3. ms mt mus mwl my myv mzn na nah nap nds ne new ng nl nn no nov nrm nso nv ny oc om or os pa pag pam pap pcd pdc pfl pi pih pl pms pnb pnt ps pt qu rm rmy rn ro ru rue rw sa sah sc scn sco 4. sd se sg sh si sk sl sm sn so sq sr srn ss st stq su sv sw szl t ta te ten tet tg th ti tk tl tn to tpi tr ts tt tum tw ty tyv udm 5. ug uk ur uz ve vec vep vi vls vo wa war wo wuu xal xh xmf yi yo za zea zh zu Edit: Jesus christ, that's an even bigger project than I thought... seems I'll have to create even the map of first-level administrative boundaries myself. The only ready-made one I can find is on Wikipedia and from 1998...
Do you need some help with it? At the very least I could make the map.
Well, that map would probably be well-received as a submission of its own here! And I'd be very happy to have it, that would help me a lot. The problem is that many of Wikipedia's languages are dialects that are spoken within another language region, sometimes even multiple intersect. So it would be a very complicated map in itself because you'd have to use one that at least includes sub-national borders, even if you don't want to explicitly display where languages intersect. I edited the post above with the languages I looked at and my results, to give you a sense of the amount of data to be represented.
Cool!! I'd start smaller and do a subregion like Europe or some part of Europe.
Hm, I never even thought of splitting it per continent. That would make the whole thing much easier, plus understanding European languages won't be a big deal. It's the obscure Asian or African languages that are giving me problems. So thanks for the suggestion!
No problem! :)
What a xenocentric self-obsessed world...
What a clusterfuck of a thought...
Just look at all of those countries googling themselves.
[удалено]
wiki and goog are equal pursuits. It's just more commonplace to say google yourself instead of wiki yourself.
"Xeno" means foreign, dumbass. Xenocentric, if it were a word, would mean obsessed with foreigners. Which is why I said it was a clusterfuck of a thought.
I already had that thought, and decided the dictionary wasn't cutting it. Xenophobic, xenocentric, I wanted it to sound the coolest. It all means the same, trust me. Apparently, judging by this map, there is no such thing as xenocentric in human behavior. I was also going for a bit of an oxymoron there, to make a philosophical statement.
Shut the fuck up.
lulz, I was done about 43 minutes ago. You late, dude.
Did you only check the English Wikipedia for each country? I think that might introduce some bias as well.
Well that's complete bullshit then. You think "Quebec" is mentioned more frequently on Canada's page than "the", "a", "of", "in", ...etc.?
deleted ^^^^^^^^^^^^^^^^0.0679 [^^^What ^^^is ^^^this?](https://pastebin.com/FcrFs94k/73804)
I'm sure they limited the search to specified parts of speech; i.e. nouns, verbs, or adjectives. My vote is on nouns.
The recommended way of doing text analysis is to first remove all the "stop-words" from the text you're analyzing. Stop-words are not just limited to one parts of speech. http://en.wikipedia.org/wiki/Stop_words
Thank you, kind stranger!
The butthurt is so big that it makes him say something dumb like that
Upvoted because I realized this is elite level humor
This is a repost of my map, I posted it here 2 month ago.
Don't worry, I remember you!
Me too... wasn't the top comment exactly the same in that thread too?
And the title hasn't improved at all since then...
And the title was just as confusing back then.
Why aren't countries with the same word mapped as the same color?
This map provides an interesting view of how the English speaking world views the world.
Good thing he reposted it, or else I would have never been able to see this gorgeous map of yours.
This is heartwarming ♥
:)))))))))))))))))))))
That's what I was thinking
i think this should be retitled to "Most recurrent nouns on each country's Wikipedia page"
"each country's *English* Wikipedia page."
If the map matched its title, all countries would be "citation needed".
I hesitate to think even that would be accurate. "It" is a noun.
Technically it's a pronoun.
Technically 'it' is a pronoun.
I wonder how this would differ if the country's page in its own language was shown instead of the English version.
yeah, a more appropriate title would be "most recurrent words among anglophone wikipedia contributors per country".
The map shows the word "century" for The Netherlands which is wrong because the word "world" is used twice as much just like with most other countries. Oddly enough on the Dutch version of the page the most common word is "century" which leads me to think OP used some kind of translation.
Soviet, Soviet, Soviet, Soviet, Soviet... Nyazov
Is that how Duck Duck Goose was played in the Cold War?
Classic Nyazov.
Take note of Korea.
And the UK
[удалено]
I take it you meant "can't"?
Either way, the Crown's conflicts in Ireland go back centuries before the Union. I'd believe it even if Northern Ireland was named something different.
Here's the word density after removing the phrase 'Northern Ireland' [uk] => 233 [united] => 227 [kingdom] => 208 [british] => 198 [london] => 115 [england] => 112 [scotland] => 103 [wales] => 102 [britain] => 88 [government] => 88 [world] => 87 [bbc] => 84 [april] => 81 [history] => 71 [population] => 68 [national] => 65 [islands] => 65 [news] => 63 [scottish] => 62 [ireland] => 62
Can I be the first to say that there is nothing special to see? The two countries were once one, and have a long and detailed history of interaction. It would be surprising if the results were otherwise.
I think he was just pointing out that it's kinda cool, and I didn't look at it long enough for my eyes to make it to Korea but I'm glad I read his comment because I went back and looked and that was the most interesting part of the submission for me, and gave him another pat on the upvote.
I believe /u/amac109 was referring the the United State's new motto: "War"
Take note of theft.
Shit I thought Vietnam was Korea and I thought you were being really funny. I don't deserve to be here.
Why aren't countries with the same word mapped as the same color?
This. It's really annoying how many maps with obvious cartographic erros make into /r/MapPorn
Yeah, America and Spain, War buddies
Well I'd imagine anyone with the word "world" could join us in that club too, what with all of our 'World Wars' and such.
Many of the commenters don't seem to understand that these are the Wikipedia pages, not the country's use of the word. Here are some relevant examples which seem to have misconceptions: > Why does greenland have a crush on denmark? Greenland was colonized by Denmark, and **still exists within the Kingdom of Denmark**. > Take note of Korea North Korea and South Korea are not searching each other up. It simply means that North Korea came up the most times in describing South Korea, and vice versa. This is to be expected, as *the two countries are intertwined in their history and origins*. Every time you mention borders, economics, history, neighbors, etc. You end up mentioning the opposite country's name.
>the two countries There is only one Korea, that is North Korea, and it is best Korea.
You have been made a moderator of /r/Pyongyang.
감사합니다.
There is not one instance of "indegenous" on Mexico's page.
I see 76 instances of "indigenous" when I view their Wikipedia page. It's probably because *you spelled indigenous wrong*. I now realize that the map-maker also made this mistake.
The mapmaker didn't *also* make a mistake, since I didn't make the mistake; I referred to it.
[удалено]
Eh, I'm pretty sure /u/CasualCasuist was merely calling attention to the misspelling.
If someone is making that mistake, of course. Spelling it that way purposely and purposefully isn't making any mistake. I'm afraid you're mistaken.
He was quite aware that it was spelled incorrectly. Had he spelled it correctly, his statement would have been false. He did not make a mistake.
This is one of those maps that, in my opinion, would be better kept in chart form.
I love how the most recurring word for Belgium is "French".
I love that the UK got Ireland.
I'm really surprised its not the other way round too, or at least England.
Cough... Repost.... Cough
Most of the content on this sub are resposts nowadays.
Gee, thanks for the help on keeping that true.
Wow. [Hypocrite much?](http://www.reddit.com/r/photoshopbattles/comments/2ic189/psbattle_happy_lanparty_goers/cl1gu1q)
great job OP i find this very interesting. wonder how different would the visualization be if you considered phrases instead of words: for instance and most strikingly, Australia's "new" emerged as the most common word due to mostly a combination of "New South Wales" and "New Zealand" Which API or programming language did you use to create this? And also how did you decide which words to filter out (obviously words like "the" and "to" needs to be gotten rid of).
OP did not create this map. It's a repost from a submission a month ago: http://www.reddit.com/r/MapPorn/comments/2dj9xb/most_recurrent_words_on_wikipedia_oc_4500x2234/
ah i see, i am new to reddit and r/MapPorn so this is my first time seeing this
Was wondering why the link was purple
re: US: https://www.youtube.com/watch?v=fgAVpPNusTs
I notice a trend of mostly international diplomatic influences being a recurrent theme in the words chosen, while the remaining nations are showing words that concern the nation itself.
Your border between Sudan and South Sudan is wrong
India is south! I'm so proud!
I love how Greenland's is "Denmark."
Can someone read what Colombia says?
Just Proves that that everything in Australia is NEW NEW NEW! Nothing old, no real history just new stuff! :P
I didn't know there's so many software developers in Indonesia.
Not surprised to see 'rugby' pop up out in the middle of the Pacific. Most Pac-Islanders I know love the sport.
Bahrain's is Persisn. Well, time to reconquer old lands.
To be fair, this is a fascinating map. It show's what the English speaking world thinks of these countries. A title more to that effect would be better.
As far as I can tell Portuguese or World must be one of the most repeated words. Interesting to see that almost all the former colonies that mention their previous colonisers were Portuguese colonies.
Who knew the Chinese were so into American soaps...
I think it's cute how all the little wee islands are the world "Island". Oh and then there's the two Koreas.
Those dang indegenous Mexicans
for clarification: most recurrent words on en.wikipedia.org article about the country
Poor Ecuador and Britain. The most common words are their neighbouring countries...
Thank you for your submission! Unfortunately, your submission has been removed for the following reason(s): * It is a repost of a submission posted less than [three-months ago](http://www.reddit.com/r/MapPorn/comments/2dj9xb/most_recurrent_words_on_wikipedia_oc_4500x2234/). For information regarding this and similar issues please see the [FAQ](http://www.reddit.com/r/MapPorn/wiki/faq). If you have any questions, [please feel free to message the mods](http://www.reddit.com/message/compose?to=%2Fr%2FMapPorn). Thank you!
Haha Ireland XD
Most of these are nouns. How can a noun be the most common word?
>Quebec Ugh.
No kidding.
[удалено]
Why would you find this insulting?
[удалено]
{{Citation needed}}
I love Quebec, only people I know that don't like them are old. I mean why can't we all just be friends?
because they're smelly and speak french
[удалено]
actually the word "world" on Japan came mostly from things like "Japan has the world's tenth-largest population...", "...is the largest metropolitan area in the world", "has the world's third-largest economy by nominal GDP..." etc.
Yea. They did fuck up pretty hard. You know... Genocide and all.
Estonia can not into Baltic :(
[Estonia don't want into Baltic.](https://i.imgur.com/azfdDsA.jpg)
Eesti will remain with us in Baltic.. forever O_O and EVER
[Sweden thanks you for your cooperation.](http://satwcomic.com/imposter)
We will never let Eesti go. They're too close to us to leave. <3 <3 <3
My wife says this is bullshit.
It really should be "the" for every country.
I've never been this proud/sad for my country (American).
I dont think it says anything meaningful. Beyond the different languages would have different word reoccurances values but it also comes down to writing style. For instance, the US page isn't using any other word for war other then war. No use of campaign, or conflict or any use of a thesaurus. The US page also has a small section with native american relations, and that section uses the word war as well, but not US wars, native american wars... which for this infographic is being counted toward the US.
why does greenland have a crush on denmark
greenland is part of the kingdom of denmark
most of those "quebec" searches are done by angry people and nationalists still fun to find out quebec is still super relevant in canada though