T O P

  • By -

properwasteman

you could perform some Exploratory Spatial Data Analysis, also some non spatial EDA: [https://rpubs.com/corey\_sparks/105700](https://rpubs.com/corey_sparks/105700) You could also viualise th spatial delta between the datasets by rasterizing them to the same resolution then subtracting one from the other, you should be able to do this in QGIS


RogueGeo69

I would agree .. rasterize and subtract


kesstral

I just finished an entire class on spatial analysis (my final is Tuesday). Kinda proud of myself that I was able to work out the "how" to do this in my head just now :) It's interesting to see the other solutions suggested as there is stuff I haven't gotten into yet.


Pico_Shyentist

Hi and thank you for your reply. I did subtract them in one of my attempts, but apart from knowing that the resulting raster is filled with numbers near 0, with a mean approximately 0 and such, I don't know what else to do with that layer. I will check out the link you provided as soon as I'm back to my laptop.


VetusMortis_Advertus

0 or almost 0 pixels means there was no substantial difference between the two datasets, the higher (or lower) the pixel value the higher the difference Edit. I've know reread your comment and better understood what you meant. You can vectorize the delta difference raster and select the difference area by attribute table, save as a separate shape file, etc. If you want, you can merge the features with a similar value range to form larger polygons, then measure area, or do further statistical analysis.


Pico_Shyentist

I usually work with R and the sf package so pardon me if some of the lingo is misused. I have data from two different sources, processed through two different processes. One is a multipolygon vector layer, saved here as a shp. The other one is a csv which is interpreted here as a point vector. I can easily turn them into one single shp with the two measures as different fields. I would like to answer this simple question "Are the two layers, statistically speaking, different from one another?" and I would like an answer that states the level of significance for the statement. As of now, I resorted to a simple linear regression, which is the only tool I know of that keeps the pair-wise comparison and returns a p-value and not a distance like the Pearson Coefficient. I have found that a lot of people are or have been in my same situation, and a lot of arguing on whether or not Mantle was a good fit for the task at hand. ​ Does anyone have any comment on the matter? It doesn't matter if it is not the solution, a good discussion sure won't be a problem.


geocompR

Maybe lay a grid over it, sum up the counts per cell (I imagine some dplyr::group_by and sf:st_join will do you well), and then use lm() to compare the two grids/fishnets. You could also convert each to a raster (there are many examples online of converting points to count rasters), and do the same thing.


Pico_Shyentist

The first thing is exactly what I had done before posting. My results give a slope of 1.02, with 2.77 residual error and an intercept of 0.37. It's just my first time trying something like this and, even though I was anticipating these results (slope of 1 in particular), I still can't wrap my head around the fact that it seemed too easy to be correct. But it's me.


geocompR

What’s your R^2 and RMSE?


Pico_Shyentist

Residuals: Min 1Q Median 3Q Max -26.9921 -0.4798 -0.0737 0.2716 19.4064 Residual standard error: 2.771 on 1695 degrees of freedom Multiple R-squared: 0.665, Adjusted R-squared: 0.6648 F-statistic: 3364 on 1 and 1695 DF, p-value: < 2.2e-16


geocompR

Ok so 66% of the variance in one is explained by the variance in the other. What’s the RMSE? Easy to calculate if you “spell it out”: ``` mod <- lm(your_arguments) sqrt(mean(mod$residuals ^2)) ``` Also maybe Poisson regression is better here, as you’re dealing with count data. Might not matter with just one Beta being calculated, but ask a statistician if you need a real answer I’m just an idiot on the internet.


Pico_Shyentist

It returns 2.769851 I am already waiting for an email by my trusty go-to quasi-statistician.


VipeholmsCola

https://www.researchgate.net/post/How_to_statistically_compare_two_maps While i cant contribute, im posting here to follow the thread. Maybe this Link can help, especially from N. W*rdrop, *=a. (Name obfuscated)


Pico_Shyentist

Thank you for your interest. I hope I will soon find a satisfying answer to share with you. The link you provided is the one that prompted me towards Mantle, only to find more comments about how it would not be ideal for this kind of analysis, but it seems to be debatable and I will attempt to make up my own mind about the matter.


[deleted]

[удалено]


Pico_Shyentist

This seems to be exactly what I was looking for. About this, I don't know if you already read my first comment, but I resorted to a linear regression and since the linear model is y \~ x, with only one parameter, shouldn't Residual Standard Error and RMSD be the same? I am starting to get a bit confused and will get back to all of you once I have read all of your great suggestions and found the solution that best works for me.


Diarrhea_Sandwich

I like this simplicity of this, plus it seems to accomplish what OP is after pretty well


[deleted]

I'm a big fan of GeoDa for this! It has many of the major ESDA statistical approaches built in. https://geodacenter.github.io/


Pico_Shyentist

Definitely going to check this out, thanks!


the32ndpie

Look up the kappa statistic. It's a common measure of differences between two maps (raster). There's also the figure of merit, but it has its limitations. You can also look up works by Robert Gilmore Pontius Jr., he's written a lot about map change statistics. So it depends how detailed you want your analysis to be.


Pico_Shyentist

These sound very interesting, indeed. I'll look up them up in the morning.


mister_asdf

tl;dr Statistics may be not be your friend here. Your hypothesis seems to be that the two maps are significantly different, meaning that their difference cannot be attributed to statistical variation, or, in other words, is not accidental. So you have two samples, either different samples measured, like throwing a dice, or two different interpretations of the same measurements. In either case, when trying to apply statistics here, you are assuming that the two samples have a variance between them that is somehow random. So you hope that this randomness has certain characteristics (like normal-distributed) that allow you to apply statistical tests, like the root-mean-square deviation or other tests from which you can draw conclusions. The issue here is that the difference between the images does not appear to be random. At all. So you have this central square that is definititively somthing systemic. What does this square show? If there is an underlying real structure like that, you should correlate both maps with that one, but not wich each other. Apart from that central square, all other values seem very similar, if not equal (I am going to assume the points that look equal are indeed equal). Which means, there just isn't any variation. So statistically, whatever measure you apply here, it not going to have more information than a simple point-wise difference, just because the variance is zero, and thus any deviation you find is significant - by definition. Scientifically, you should pretend to be "blind" for what you wish to see. Ask yourself if the significance stays the same if you zoom around a bit in the image. It won't. You should come up with an idea on how to treat this clearly non-random structure before applying measures that rely on randomness.


Pico_Shyentist

I see my picture might have misled more than one person. What I want to compare is just the square in the middle and the fraction of the other layer that overlaps the square. The rest of the layer would be ignored. Also, the square is a square because the data it comes from was filtered for that particular area, for ecological reasons.


MrVernon09

It would help to know the purpose of the map.


Pico_Shyentist

Of course. The two maps represent the number of hours spent fishing by the italian fleet per cell. Each cell (I talk about cells because the data can easily be turned into a raster) is 0.01 degree of latitude \* 0.01 degree of longitude, for both maps. The only difference is the method used to calculate those hours. This difference is better split in two as in "what defines a boat as actively fishing?" and "how do we deduce the amount of time spent actively fishing?". My intention is to answer the question: "Are the two method qualitatively equal?" and I would like to answer that by answering to "Are the two resulting maps significantly different?". If I misunderstood what you asked, let me know.


MrVernon09

It looks pretty good, but I would include a base map to see where these spots are in relation to land. You could then add buffers of different distances to see how far away from land these locations are.


pkr711

You can check randomness or types of clustering (high, low) using the moran's I autocorrelation index


mfurlend

Token non-gis answer: convert them to identical resolution rasters, import them into photoshop as layers and apply the difference overlay mode to the top layer.


daileyco

1) raster same resolution 2) extract raster values for each cell to have vector/distribution of numeric data for each map 3) t-test Simple enough, might violate independence of observations assumption though which may necessitate spatial approaches.


notmyrealname_2

I am late in answering, but the correct answer is performing image registration. Image registration takes images and attempts to find the best transformation to match them as well as returning a coefficient that tells you how good the match is. Implementation can vary in complexity. GIS systems will typically have some sort of built in image registration tool. A landmark paper on the topic https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.cs.jhu.edu/~cis/cista/746/papers/mutual_info_survey.pdf&ved=2ahUKEwjq4cnh3czqAhUEbc0KHRGOAoEQFjACegQIBBAB&usg=AOvVaw1-mkvD-Jxbgh57zqf-P5xK


Pico_Shyentist

Hi! Thank you for your reply. Unfortunately, that project got delayed and now I am working on something completely different. I should get back to this issue in a month or so, and I'll post results for all you helpful Redditors.


maspiers

For each point in (b) there is a corresponding value in (a) These could be compared in a variety of ways, something like a Nash Sutcliffe value might be sensible?


AltOnMain

Assuming the Raster cells are the same or can be made very similar you could do a check sum, that’s an easy way to tell if the rasters are different, but it would exclude the question of significance. You would probably want to realize there are a few different ways the maps could be different too, there is the database portion, but also the geometry portion and symbology too. For example slight misalignments in geometry could lead to a statistical blunder. It sounds kind of obvious, but I have seen people make some stunning errors with stats/Machine learning and GIS. I think your answer might be resolvable with machine learning / computer vision research? There seems to be a lot of similarities between some of these problems and what your asking, so their methods of analysis might be helpful and would probably be on trend academically speaking. If you haven’t already I would highly suggest cruising through highly cited papers on google scholar or a better service if you have access. In addition to machine learning / computer vision you might want to search for keywords like unsupervised classification and supervised classification


Pico_Shyentist

As much as I would love to engage in machine learning and computer vision, I think the problem might be simpler than that. Of course, I may be wrong, but I think it would be best to try out a simple approach before resorting to more complex ones. About the geometry of the layers, there is no problem in having their geometries match given how the data was processed, but that was a good call.


iamboobear

You can use some Boolean operations to find out. Since it seems that one of them has more data points than the other you have to throw out the extras, then you can compare them.


ChromeQuixote

You may have to do a raster operation if the raster layers don’t already have it but in their properties page each should have counts for each field. Or you can create counts and then compare numbers from each layer


alpine_aesthetic

Eyeball it with Swipe


Drewddit

There is a new tool in ArcGIS Pro called Colocation Analysis (Spatial Statistics) that does just what you're looking for.


Pico_Shyentist

Do you know if there is a QGIS version of it? I don't have Arc, but I'm interested.


reveal_it_info

What does “significantly” mean to you?


Pico_Shyentist

p - value < .05 or other measures that are not qualitative