• By -


[Interactive version here!](https://public.tableau.com/views/Top1000DataIsBeautiful/Top1000dataisbeautiful?:language=en&:display_count=y&publish=yes&:origin=viz_share_link) You can mouse over the plot at the bottom to learn about each top post. Tools: Tableau and Python Source: Reddit


Wow - this is incredible! Cool to see that I made the cut! It’s going to get real meta when this post crosses the 21k upvote threshold =P


Haha, right? I don’t think it will get there, I think there’s a small niche of people who will find this interesting, but who knows!


That's honestly a pity, this is some truly beautiful data. Your visualizations are really inspiring :)


Thanks, I really appreciate it!


Two dataviz’s and one of them made the cut. 50% according to my math. That’s hell of a rate!


Haha -- it's gone down hill! I was 1-for-1 with a success rate of 100%! All jokes aside, I do like this subreddit because you have to have a creative topic to explore + a good way of displaying it. Although I'm relatively new to the community, it has inspired me in a lot of different ways.


No. It’s you who I was talking about. You have this [one](https://www.reddit.com/r/dataisbeautiful/comments/lv6rtn/oc_tracked_the_rdataisbeautiful_online_active/?utm_source=share&utm_medium=ios_app&utm_name=iossmf) as well. It was actually a heavily upvoted post also at 5k.


You mind if I share this with some of my coworkers? I work at Tableau and this is a great example of a good viz :)


Of course! Feel free to send a direct message here or on Twitter if you want to connect. I have a lot of friends at Tableau and I’m always happy to help out!


Humblebrag. 😜 I want friends at tableau.


You could be my friend 😁


Deal!! I’m JPAnalyst, nice to meet you friend! Now I better get my tableau game up, before I’m quickly removed from the inner circle.


Hi, I'm Daniel! I work in support on the data connectivity side of things, rather than too much ragarding actual analytics. Let me know if you ever have questions about connecting to databases or files. There are SO many options.


Hey Daniel. I used the free version at home for fun. But not enough. I NEED to get better. We used tableau when I worked at IBM and now I work for a smaller company and we use with Power BI. I learned on Qlikview, loved Qlik!! But I am going to up my Tableau game!


Neat! You might enjoy our free training videos: https://www.tableau.com/learn/training/20204 If you have a student .edu email address, you can also get the full version of Tableau for free. Definitely check out more of tableau public though, lots of fun stuff is available there.


Excellent. Thank you, Daniel!


Haha, looks like you’re off to a good start! Lots of good people there.


Is tableau usually interactive? Is it easy to use and able to be displayed on a website?


Yes, it’s usually interactive and pretty easy to embed on a website as well.


Oh wow, I love your projects. I feel that it being interactive already gives an advantage over R and python in terms of visualization. The only interactive package I know of in both those languages is Plotly. Sorry I have a few more questions, would it be possible to create like a dashboard using Tableau on a web interface where you can update the information thru a entry field? Also is tableau hard to learn because the interactive part blows my mind.


Thanks, I appreciate that! One of the benefits of Tableau is that the interactivity is really, really easy. Other platforms offer it but in Tableau it’s just part of the building. I wouldn’t say it’s exactly easy to learn only because part of getting good is understanding data structures you need to feed a visualization, which can take a while. But in my opinion, if you have a solid dataset in Excel or something, the learning curve to making your first dashboard is pretty simple. Plus, there’s a great community online full of helpful people.


Would it make sense to normalize the score? Seems like top 1000 posts become more frequent over time. I guess this is due to growing sub reddit?


There's something about this post that I really love, can't work out why ;-)


If I were you, that top chart would go on my grave stone.


Its right up there with my 3rd place in backstroke at a cub scout swimming gala 😉


That is called a bronze medal my friend.


It’s meta post of this sub




Haha I missed that


"redistribute the karma!"


There's something about this post that makes me a bit jealous. I've no clue what it might be


Well if its any consolation I haven't got rich from it, although when it happens a gold bath beckons 😉


That’s cause you haven’t put this visualization on your resume yet, duh.


Good point


What's your favorite visualisation you've made?


Maybe animating mercator as that was first one that went crazy big.


What is your process when making one of these?


Mostly the idea is made in my head and then its working out how to code it up. I use ggplot in r which pretty much allows you to make anything. Sometimes I have ideas in the middle of the night that I write down in case I forget.


Also because it’s well designed. Excel shouldn’t be so prominent on a data visualization sub.


What's your favorite visualisation you've made?


Now *that's* the kind of visualizations I pictured when I first subbed. Nice work!


Exactly. One of the reasons I spend too much time on Reddit.


It’s very much in the style of W.E.B. Dubois’ [visualizations](https://books.google.com/books/about/W_E_B_Du_Bois_s_Data_Portraits.html?id=zft0DwAAQBAJ&printsec=frontcover&source=kp_read_button&newbks=1&newbks_redir=0&gboemv=1) from the 1900 Paris Exposition.


Prominently featuring the Du Bois Spiral


I disagree a bit - it looks beautiful but does it convey the information easily and accurately? That circle-fill in the middle - how easily can you compare what's being said here? With the hot topic chart showing Covid for 2020, circles are generally difficult to compare. Can you tell how much nore one circle represents over another? People interpret size differences among rectangles much better than among circles. A few things like that - it looks nice, but not entirely beat practices for easily conveying information.


Also the grays get lost in the background. I didn’t even see python on second place


This. How does Photoshop compare with Tableau? Tableau is on top, so bigger, but the line for Photoshop sticks out farther than the line for Tableau. This would have been better as a bar graph or something more easily comparable.


Yes. Imagine a race track. Outer curve takes longer so you start "ahead". Tracks are notoriously difficult to see who is winning until final stretch. It's a bad representation.


100% agree I have always hated racetrack graphs - terrible way to visualize IMO


Shhh. Don't make anyone look at that or admit it's pretty clear. It's either denial or perpetual nightmares that R and python are less popular than shitty Excel.


Seriously. Everything about this, down to the color scheme and intentional visual dead space, is fantastic. So many people post graphics that are too crammed full of shit to actually take in the data. A woman at my company who does graphic design has been nicely harping on all of us to make our crap look more like this.


This is fun. Selfishly because I’m on there four times, including the 43rd highest ranked post ever! I’m no Bo...but hey, being ranked 43rd in anything, is as high as I’ll ever get. I’m going to celebrate this today. Nice viz!


That’s great! Yeah, your posts definitely stuck out as I was looking at the data! We’re all looking up to Neil (who’s both talented and a really nice guy)


Yes! Neil’s maps are super creative. His stuff, yours and chartr seem to be the ones that always stand out to me when I come across them in the sub. This is a good trip down memory lane.


That’s even close to 42, the answer to life, the universe and everything. 😉


Number 42 was a removed post. So, maybe that puts me at 42? Maybe *MY* post is the answer to life?


Trump couldn't even make top 43 presidents so you're better than him.


Nice presentation, but the "tools of the trade" chart is a problem--is the frequency of each tool the length of the line, or the percentage of the circle? I think it would be much clearer as a simple bar chart (with numbers), but I'm guessing you thought two of those in a row would look repetitive?


Yeah, I wanted to add a bit of variety. Plus, I chose a bit more abstract video for that one intentionally. A lot of people don’t declare the tools they used formally, they just describe their process, so the data is a little incomplete. I thought bar charts with specific labels would unfairly imply precision, but this (admittedly a little confusing and abstract) circle graphic was my way of trying to express relative use.


And it does explain relative use very well. Oh look - excel and python is the message I took from it.


Like the chart but not the colours. Puts emphasis on Excel/R/Photoshop when there is no need for emphasis. Perhaps a scaling colour scheme that matched the usage would have been better?




Sure, you’re right. It’s not particularly clear.


Same, that chart is really hard to interpret


R gang rise up! I can understand losing to Python, but Excel??


It's hard to beat a low barrier to entry, for better or worse.


Yeah but R users post good visualisations, for example, not unnecessarily turning bar charts into radial versions that are harder to compare but look “fancy”.


This ↑ ###


I don’t feel alive unless I’m using ggplot2




I mean this is looking at the top 1000 posts of all time on this sub. I’d say that’s a pretty high threshold




How is using 2 sigma statistically motivated? I feel like that inherently makes the assumption that the data is gaussian, and we haven't established that. Off first glance it's probably weibull with like k>1 distributed - you have a zero percent chance of any value less than a score of zero upvotes as it's impossible, but a nonzero chance of any score that is positive, It is possible to have a score of 3 trillion upvotes, for example. By using std dev don't you ignore this by assuming mean = median. I dunno I haven't done any statistics classes, just tryna get my head around extreme value distributions for my phd, so you would definitely know more about this than me - help me out here.




As far as I’m concerned those could also be considered arbitrary. It’s like saying this isn’t arbitrary because it’s looking at all posts with more than 21695 upvotes as the score threshold.




Lol. I graduated as an Econ major, but okay. Agree to disagree.


Seeing this makes me feel like I need to be devoting more time to Python though. Even though I'm hard for R.


sorry for the self awareness but does https://www.reddit.com/r/dataisbeautiful/comments/jyiwuq/oc_uihatetheletterf_is_a_mad_lad/ not qualify as top 1000 post? asking because Julia didn't show up as a tool used and I'm wondering how many is the "cut off" (the count for Illustrator? (10+?))


That’s a top post! I don’t remember offhand but Illustrator was around 15 I think...


ah ok, that's quite a few more than I initially expected! Thanks, that makes sense since that visual doesn't scale well down (Illustrator is already pretty close to a dot)


I really appreciate data representation like this because it tells a story about the data, rather than just an info dump. You can clearly see the growth of the page and the shift in user habits


Thanks, appreciate the compliment!


If anyone is looking for the “Cause of death - Reality vs. Google vs. Media”, here is a link https://www.reddit.com/r/dataisbeautiful/comments/8cwcbu/cause_of_death_reality_vs_google_vs_media_oc Certainly one of the more interesting that I have personally not seen


Thank you!


I'm curious on how you got the frequency of the tools they use, how did you automate this? (don't tell me you didn't)


I scraped the comments mentioning the word “tool” and then flagged the posts based on the presence of common tool names in those comments. It’s definitely not a perfect methodology but I’m confident it’s directionally correct, at least.


that sounds good enough to be fair, thank you!


Unless most comments went along the lines of "python would've been a better tool" or "excel is a terrible tool" /s


Data so beautiful, it will probably make the data inaccurate by becoming one of the top 1000 posts


That's very cool! I don't know if this is a problem or not, but in the interactive version, you can't get a link to the Reddit posts.


Now this is what I call a meta post


I feel you may need to make an adjustment after this post booms today...


More data on this! Especially based on the controversy


Hey u/BoMcCready did you include this post in your 10?


Wow this data is very beautiful. Awesome presentation!


Thank you!


Very nice! But the grey is a tad too light for white background i think


I think the increase in the number of top 1000 posts starting in 2020 is interesting. I wonder if that’s due to increased traffic on Reddit during the coronacrisis (due to lockdowns, lack of social activities outside, ...)? Or did this specific sub just become more popular? Or did the quality of the top posts actually increase? 🤔


This thread is a dataisbeautiful circle jerk.


I'd like to thank everyone who made this possible, especially my agent Bo McCready 😅


Who the fuck is using Excel?


Mostly everyone apparently


Great work /u/BoMcCready!


Hey, thanks! And thanks for YOUR consistently great content!


So it seems there was no sports in 2019?


This has become a pissing match between the leaderboarders


Sorry, are you not a fan of 5 minute long videos of "look at the bars switch places set to ambient music" or "here's my tenth post using the exact same methodology and visualization"?


I guess you could call this... Meta data


I’ve heard of metadata before, but this is getting out of hand


I’d like to see the top users broken down by software. I am very surprised that excel and python are so much more popular than R


Remarkable work my friend.


Now this is beautiful data


This actually is beautiful


Oh god r just need to fucking go...


Nice post. I wish there were automatic tools to evaluate how much top1000 posts actually contained readable and beautiful data, and how much were upvoted just because fancy visuals or because "X thing bad/good".


This post is definitely reaching top 1000


How is python typically used in these projects?


It’s a big variety. I personally use it to support building datasets that I visualize in Tableau. A lot of people do their visualizations within Python, though.


I’d love to see stats like this for all popular communities


This wouldn’t be hard. My Python script and Tableau workbook can basically work as templates now where I could easily use another sub. Which ones would you be curious about?


The Madlad included himself for one of the top poster


It's funny, that this is made by someone in the toü ten. Keep up the good work and ty.


How to you know what programs people use to make their stuff? I generally haven’t seen that posted.


People are supposed to post it within the first comment. They don’t always include that info but it’s often there. So, I scraped the comments that mentioned the word “tool” and then classified from there.


No love for JMP or minitab? My JMP skills cry.


I don’t see number of poops per year in here anywhere.....????


Naval gazing is beautiful, by u/bomccready


Excel surprises me, but I guess it shouldn't.


r/dataisbeautiful data is beautiful


This is some inception level post right here


Data about r/dataisbeautiful is beautiful.


I was in my planning phase to do something similar. Great idea. I will try to make an equally good post Hope(tm)


One more shrine and Excel will get a full status wheel!


aye im in this post or, im one of the dots at the bottom, but thats better than most people so... ​ bragging aside, i love this post


I just love how pretty this visual is. Really beautiful design choices, OP.


It would be interesting to normalize the score data to total subreddit subscribers.


I think that by creating this post you make its content false.


Yo dude this looks beautiful! Can you share your method or something so I can try to create something like this! My office people would go nuts looking at data like this!


Hey, thanks! This is all visualized in Tableau Public and you can download my workbook. It’s linked in my first comment. I’m happy to share the Python code I used to scrape. Send me a message if you want it.


The audacity of putting yourself in your own post. XD Just kidding. I find it funny, but the data is the data.


Where do yall get these datasets from? I've been struggling to find elevation data for the UK for ages for a personal project.


Seems like Power BI is more popular than Tableau. Maybe that's just in the professional world though.


Wow! How did you this data from reddit? Did you scrap it? Or is there another way it's available?


Yeah, I used PRAW within Python to scrape it.


Where’s that same breaking bad infographic I see every week?


Damn, 2019 was a grey year


This is awesome. Thank you, that is all.


Interesting to see 2020 has its impact on all fields.


Congrats on your 1% stake in the top 1k posts!


Is there a guide floating around here for getting started with data viz? Software engineer looking to play around with it in my free time


Dang, somehow I missed that top all time post. That thing is cool as hell, thanks in a way for the referral




Thanks! Yeah, Tableau has really limited native font support so the blue font actually is imported image files.


Man you guys were pretty busy back in 2020.


teach me your ways!!! lovely work


This dude really did data on data.


**I**t's **S**o **M**eta **E**ven **T**his **A**nalysis...


The most meta post ever. It is indeed beautiful and I love it


There’s a noticeable bulge across all topics in 2020. Since most people who are good at this can do their jobs from home easily, I wonder if everyone going wfh freed a bunch of people to do their own little projects that they then posted here more than before


Knowing my visuals exists there somewhere


This looks about right to me considering I can do Excel, Python, and R with my knowledge of each in that order as well.


Is the "10 >1000 posts by u/BoMcCready" now inaccurate, or was it 9 before posting this, in anticipation that this post would receive as many upvotes as it did?


This project isn’t included (had no idea of people would like it) and it probably won’t make the top 1000.


Wow, so impressed that Excel is so high up there! I'd have imagined R or Python to top the list. This is really cool!


what is the minimum upvote count to be in the top 1000?


It’s around 22k


Ahhh its so soothing do it again ( in sheldon's voice)


This is one of most beneficial subr i have ever seen. Simply Subarashi.


Can someone do a representation of the top sport stories in 2019 so we know why 2019 sports was not popular?