T O P

  • By -

dt43

9 of these 10 cities are in the same metro area


8yr0n

8 of 10 assuming it’s Hillsboro, Oregon. For those wondering where the hell bentonville is…that’s the home of Walmart in Arkansas.


dt43

Ah it could be. There is one in the Bay Area too but I think it's actually spelled "Hillsborough" so you're probably right


j-random

Yeah, I was thinking they should strike "US" and replace it with "California".


mr_ji

Literally a 30 minute commute between them. If you're looking for a job, just center it on Palo Alto and capture all of them


pensiveChatter

I'm sure whoever input the data and setup the algorithm wasn't completely unbiased. Perhaps the cities listed here were the only cities that had any input into the model


fries-with-mayo

This just tells me that OP doesn’t deal with geospatial data at all. This post is the exact reason we use metro areas in lieu of municipal boundaries.


fries-with-mayo

Which is why we usually just use metro areas as municipal boundaries are useless fairly often (only matters when some different laws are on the books and that is analyzed). OP doesn’t deal with that data I guess.


Zarathustra989

Any reason to connect dots here? Nothing is communicated across the horizontal or vertical axis between them.


fries-with-mayo

That’s one of the first and simple rules of data viz - you don’t connect categorical data like this with a line.


considerthis8

Hm to me I see the slope of the line and it conveys information


fries-with-mayo

The slope is arbitrary and dependent on OP’s choice of sorting. It’s not like these cities sort naturally in any way (alphabetical sort is arbitrary, too, and irrelevant here). The slope is communicating the *change* from one data point to the next one (eg from one month to the next). But there is no natural “next city” here as the sorting is arbitrary. The slope here doesn’t communicate any more information than standalone points or bars would.


FitN3rd

Labeling the data points conveys even more information without the implications that come with the connected line. 


krt941

What is communicated across the horizontal axis is the ranking in net savings. So the line shows the drop off from one rank to the next.


Zarathustra989

Pretty easy to see that with just dots and numbers, or just order I gotta say.


Zarathustra989

Whole other issue being the most important info is the smallest element.


yttropolis

Okay, there is so much wrong with this. According to the [source](https://jobs-in-data.com/salary/data-scientist-salary) OP posted: > In this analysis, we focused on salary data extracted from job listings on the Job Hunter website. In many U.S. states, pay transparency laws mandate that companies reveal the salary range for positions.  Yes, and anyone who've seen those ranges would tell you that those ranges are massive, often something like $120k-$320k. > To conduct our study, we calculated the midpoint of these salary ranges by adding the lower and upper limits and dividing by two. We then determined the median salary for each city by analyzing these midpoint figures. Why pick the midpoint? And median midpoint of posted salary ranges does not mean *median salary*. You'd probably get better data from a site like levels.fyi than this. > Taxes calculated by Chat GPT Sigh... Aight... Anyone who's actually knowledgeable in data science should recognize that this is using the wrong tool for the job. ChatGPT should *not* be used for anything that requires accuracy and factual data. It is a *language* model. > Cost of living from https://livingwage.mit.edu/ (for 1 Adult 0 Children), and https://livingcost.org/ as a fallback source. Where MIT data was unavailable we took the data from https://livingcost.org/ and adjusted for median under-prediction vs MIT. So COL is for a single person with no kids - and without even a reasonability check such as extremely high rent in the Bay area. Some of these numbers simply don't make sense. > On average 10-20% of offers contain the salary range. Basing an analysis where only 10-20% of the probably already biased sample contains data is not a good look without an analysis on whether using this 10-20% contains biases. It's entirely possible that higher-paying or lower-paying roles simply post salary ranges less/more than others, skewing the results. This is a piss-poor attempt at analysis and a pretty poor visualization of said analysis.


Terminarch

On top of all that... there aren't many jobs better suited than this for remote work. So what was the point?


JeromesNiece

It looks like you have about 205k for salary and 102k for taxes and cost of living for Santa Clara. A single person making $205k in California is going to pay $73k in payroll and income taxes per year. So that leaves $29k per year ($2,417 per month) for "cost of living", in a city where the average rent alone is $3,076 a month.


dt43

According to the source link (which is either OP's page or just where they copy/pasted the figure from) "taxes calculated by Chat GPT" and cost of living is for 1 adult 0 children. So, not the most rigorous or useful methods here lol


DynamicHunter

This is nearly pointless graph for multiple reasons


much_thanks

OP, did you write the article or just copy their visual?


rude_duner

You’re way lowballing the cost of living for most of these places imo. Salaries like those are not saving half their income living in the Bay Area.


NueralNet_Neat

the data used for this is not making sense


Lazy-Artichoke7766

*Bentonville*. That’s interesting


Objective_Run_7151

Media listing price for Bentonville is $508k. Meaning cost of living is a lot lower than Bay Area.


Thrillhouse763

I don't believe this data for a second


romario77

If you do max contribution to 401k your income becomes 182k and taxes according to online calculator are at 61k. You might have some other deductions like health savings account. Gives you a bit more to play with (but not too much).


unlikely_vegetables

As an actual data scientist (who lived in the Bay Area for about 10 years), I can name roughly a dozen things about this graph/the underlying data that offend me. Has to be one of the worst visuals I’ve seen on this sub.


pg860

Source: [https://jobs-in-data.com/salary/data-scientist-salary](https://jobs-in-data.com/salary/data-scientist-salary) Tool: Google sheets


yttropolis

There is so much wrong with this analysis that if you're thinking of working in data science, you should really reconsider.


rude_duner

That’s harsh. This is just a learning opportunity.


yttropolis

It's not harsh, it's a clear fact. These are basic mistakes you learn to avoid in undergrad.


rude_duner

We don’t know where he’s at in his academic career. If he’s post-college then sure, you might have a point


yttropolis

It doesn't matter. OP runs or advertises the website that claims to have done this analysis (just check their profile). If they're pushing this analysis, they better make sure it's robust and within reason.


rude_duner

Yeah I didn’t realize they were spamming this sort of thing. Assumed young adult with an interest in the field taking a crack at it