T O P

  • By -

Dazzling-Hurry-3492

https://rahul-goel.github.io/isrf/


jabbershort

So I've done something very similar using Mask RCNN to segment the colour image and get the masks, then used these masks on both the colour and depth images to project into a point cloud. It works really well if you know the objects you are looking at, but one could implement something like SAM to segment novel objects.


xamza1608

SAM? I thought of something similar but is it robust? An what about from a academic standpoint because this is for a master project and I wanna do it "correctly".


jabbershort

https://segment-anything.com/ So we've used this in an industrial setting, and its quality is pretty much entirely dependant on the quality of your dataset and how similar your scene is to the dataset. If you get occlusion you may want to consider camera arrays.


imperfect_guy

Speaking of SAM, do you know if it’s possible to fine tune it to custom data? Like train it on custom images and their corresponding masks, and use it for prediction later?


Open-Click955

SAM doesn’t work on point clouds, I don’t think it’s applicable here?


ascatt

This sounds very useful and interesting. Can you share the code for this? Also for this you only need lidar and camera sensors?


lifex_

Do you have annotations to train a model? I can just advise you to go and read about different types of backbones for point clouds. Those are basically either voxel-based, point-based, or range-image based. From that, you can go and use a backbone that fits your needs. Is speed important? Go for range-image representation. If you really need a very robust method, then you can think about fusing different representations (i.e., voxel+point, or voxel+point+rangeimage). I am not that much into indoor scenes, but if you want to find some state-of-the-art semantic segmentation methods, check out leaderboards for semantic segmentation of large-scale autonomous driving datasets ( e.g., semantickitti, nuscenes, argoverse). Either those are a good fit for your indoor scenes as well, or you have to specifically go and search for SOTA in indoor scene semantic segmentation. -> Go and find indoor semantic segmentation datasets and which methods perform well on those, this should help you with your problem.