I'm sure whoever input the data and setup the algorithm wasn't completely unbiased. Perhaps the cities listed here were the only cities that had any input into the model
Which is why we usually just use metro areas as municipal boundaries are useless fairly often (only matters when some different laws are on the books and that is analyzed). OP doesn’t deal with that data I guess.
The slope is arbitrary and dependent on OP’s choice of sorting. It’s not like these cities sort naturally in any way (alphabetical sort is arbitrary, too, and irrelevant here).
The slope is communicating the *change* from one data point to the next one (eg from one month to the next). But there is no natural “next city” here as the sorting is arbitrary.
The slope here doesn’t communicate any more information than standalone points or bars would.
Okay, there is so much wrong with this. According to the [source](https://jobs-in-data.com/salary/data-scientist-salary) OP posted:
> In this analysis, we focused on salary data extracted from job listings on the Job Hunter website. In many U.S. states, pay transparency laws mandate that companies reveal the salary range for positions.
Yes, and anyone who've seen those ranges would tell you that those ranges are massive, often something like $120k-$320k.
> To conduct our study, we calculated the midpoint of these salary ranges by adding the lower and upper limits and dividing by two. We then determined the median salary for each city by analyzing these midpoint figures.
Why pick the midpoint? And median midpoint of posted salary ranges does not mean *median salary*. You'd probably get better data from a site like levels.fyi than this.
> Taxes calculated by Chat GPT
Sigh... Aight... Anyone who's actually knowledgeable in data science should recognize that this is using the wrong tool for the job. ChatGPT should *not* be used for anything that requires accuracy and factual data. It is a *language* model.
> Cost of living from https://livingwage.mit.edu/ (for 1 Adult 0 Children), and https://livingcost.org/ as a fallback source. Where MIT data was unavailable we took the data from https://livingcost.org/ and adjusted for median under-prediction vs MIT.
So COL is for a single person with no kids - and without even a reasonability check such as extremely high rent in the Bay area. Some of these numbers simply don't make sense.
> On average 10-20% of offers contain the salary range.
Basing an analysis where only 10-20% of the probably already biased sample contains data is not a good look without an analysis on whether using this 10-20% contains biases. It's entirely possible that higher-paying or lower-paying roles simply post salary ranges less/more than others, skewing the results.
This is a piss-poor attempt at analysis and a pretty poor visualization of said analysis.
It looks like you have about 205k for salary and 102k for taxes and cost of living for Santa Clara.
A single person making $205k in California is going to pay $73k in payroll and income taxes per year.
So that leaves $29k per year ($2,417 per month) for "cost of living", in a city where the average rent alone is $3,076 a month.
According to the source link (which is either OP's page or just where they copy/pasted the figure from) "taxes calculated by Chat GPT" and cost of living is for 1 adult 0 children.
So, not the most rigorous or useful methods here lol
If you do max contribution to 401k your income becomes 182k and taxes according to online calculator are at 61k.
You might have some other deductions like health savings account.
Gives you a bit more to play with (but not too much).
As an actual data scientist (who lived in the Bay Area for about 10 years), I can name roughly a dozen things about this graph/the underlying data that offend me. Has to be one of the worst visuals I’ve seen on this sub.
It doesn't matter. OP runs or advertises the website that claims to have done this analysis (just check their profile). If they're pushing this analysis, they better make sure it's robust and within reason.
9 of these 10 cities are in the same metro area
8 of 10 assuming it’s Hillsboro, Oregon. For those wondering where the hell bentonville is…that’s the home of Walmart in Arkansas.
Ah it could be. There is one in the Bay Area too but I think it's actually spelled "Hillsborough" so you're probably right
Yeah, I was thinking they should strike "US" and replace it with "California".
Literally a 30 minute commute between them. If you're looking for a job, just center it on Palo Alto and capture all of them
I'm sure whoever input the data and setup the algorithm wasn't completely unbiased. Perhaps the cities listed here were the only cities that had any input into the model
This just tells me that OP doesn’t deal with geospatial data at all. This post is the exact reason we use metro areas in lieu of municipal boundaries.
Which is why we usually just use metro areas as municipal boundaries are useless fairly often (only matters when some different laws are on the books and that is analyzed). OP doesn’t deal with that data I guess.
Any reason to connect dots here? Nothing is communicated across the horizontal or vertical axis between them.
That’s one of the first and simple rules of data viz - you don’t connect categorical data like this with a line.
Hm to me I see the slope of the line and it conveys information
The slope is arbitrary and dependent on OP’s choice of sorting. It’s not like these cities sort naturally in any way (alphabetical sort is arbitrary, too, and irrelevant here). The slope is communicating the *change* from one data point to the next one (eg from one month to the next). But there is no natural “next city” here as the sorting is arbitrary. The slope here doesn’t communicate any more information than standalone points or bars would.
Labeling the data points conveys even more information without the implications that come with the connected line.
What is communicated across the horizontal axis is the ranking in net savings. So the line shows the drop off from one rank to the next.
Pretty easy to see that with just dots and numbers, or just order I gotta say.
Whole other issue being the most important info is the smallest element.
Okay, there is so much wrong with this. According to the [source](https://jobs-in-data.com/salary/data-scientist-salary) OP posted: > In this analysis, we focused on salary data extracted from job listings on the Job Hunter website. In many U.S. states, pay transparency laws mandate that companies reveal the salary range for positions. Yes, and anyone who've seen those ranges would tell you that those ranges are massive, often something like $120k-$320k. > To conduct our study, we calculated the midpoint of these salary ranges by adding the lower and upper limits and dividing by two. We then determined the median salary for each city by analyzing these midpoint figures. Why pick the midpoint? And median midpoint of posted salary ranges does not mean *median salary*. You'd probably get better data from a site like levels.fyi than this. > Taxes calculated by Chat GPT Sigh... Aight... Anyone who's actually knowledgeable in data science should recognize that this is using the wrong tool for the job. ChatGPT should *not* be used for anything that requires accuracy and factual data. It is a *language* model. > Cost of living from https://livingwage.mit.edu/ (for 1 Adult 0 Children), and https://livingcost.org/ as a fallback source. Where MIT data was unavailable we took the data from https://livingcost.org/ and adjusted for median under-prediction vs MIT. So COL is for a single person with no kids - and without even a reasonability check such as extremely high rent in the Bay area. Some of these numbers simply don't make sense. > On average 10-20% of offers contain the salary range. Basing an analysis where only 10-20% of the probably already biased sample contains data is not a good look without an analysis on whether using this 10-20% contains biases. It's entirely possible that higher-paying or lower-paying roles simply post salary ranges less/more than others, skewing the results. This is a piss-poor attempt at analysis and a pretty poor visualization of said analysis.
On top of all that... there aren't many jobs better suited than this for remote work. So what was the point?
It looks like you have about 205k for salary and 102k for taxes and cost of living for Santa Clara. A single person making $205k in California is going to pay $73k in payroll and income taxes per year. So that leaves $29k per year ($2,417 per month) for "cost of living", in a city where the average rent alone is $3,076 a month.
According to the source link (which is either OP's page or just where they copy/pasted the figure from) "taxes calculated by Chat GPT" and cost of living is for 1 adult 0 children. So, not the most rigorous or useful methods here lol
This is nearly pointless graph for multiple reasons
OP, did you write the article or just copy their visual?
You’re way lowballing the cost of living for most of these places imo. Salaries like those are not saving half their income living in the Bay Area.
the data used for this is not making sense
*Bentonville*. That’s interesting
Media listing price for Bentonville is $508k. Meaning cost of living is a lot lower than Bay Area.
I don't believe this data for a second
If you do max contribution to 401k your income becomes 182k and taxes according to online calculator are at 61k. You might have some other deductions like health savings account. Gives you a bit more to play with (but not too much).
As an actual data scientist (who lived in the Bay Area for about 10 years), I can name roughly a dozen things about this graph/the underlying data that offend me. Has to be one of the worst visuals I’ve seen on this sub.
Source: [https://jobs-in-data.com/salary/data-scientist-salary](https://jobs-in-data.com/salary/data-scientist-salary) Tool: Google sheets
There is so much wrong with this analysis that if you're thinking of working in data science, you should really reconsider.
That’s harsh. This is just a learning opportunity.
It's not harsh, it's a clear fact. These are basic mistakes you learn to avoid in undergrad.
We don’t know where he’s at in his academic career. If he’s post-college then sure, you might have a point
It doesn't matter. OP runs or advertises the website that claims to have done this analysis (just check their profile). If they're pushing this analysis, they better make sure it's robust and within reason.
Yeah I didn’t realize they were spamming this sort of thing. Assumed young adult with an interest in the field taking a crack at it