Weakly Supervised Learning Method for Semantic Segmentation
of Large-Scale 3D Point Cloud Based on Transformers

Zhaoning Zhang, Tengfei Wang , Xin Wang, Zongqian Zhan


Introduction


Semantic segmentation is a key technique to assign a semantic label to each individual point in a point cloud. However, the large demand for supervised data and the difficulty of learning local features of point cloud are still unsolved problems.

To improve 3D point feature, inspired by the idea of transformer, we employ a so-call LCP network that extracts better feature by investigating attentions between target 3D points and its corresponding local neighbors via local context propagation.

Training transformer-based network needs amount of training samples, which itself is a labor-intensive, costly and error-prone work, therefore, this work proposes a weakly supervised framework, in particular, pseudo-labels are estimated based on the feature distances between unlabeled points and prototypes, which are calculated based on labeled data.

The methodology and workflow of our approach are illustrated in the figure below. We begin by feeding the point cloud into an LCP network to predict the initial semantic information of the point cloud. Next, we employ a prototype pseudo-label generation strategy based on momentum to generate pseudo-labels for unlabeled points. These pseudo-labels, along with the predicted results, are optimized using a loss function.


Network


We propose an effective weakly supervised framework based on Transformer, and the overview of the framework is illustrated in the figure below. Our approach combines a Transformer network with LCP (Local Context Perception) modules and pseudo-label generation techniques to achieve better semantic segmentation results with only a small amount of real annotations.


We constructed a UNet-like network for semantic segmentation tasks using 4 LCP Blocks and 4 up-sampling layers, as it requires per-point features for dense prediction. Before entering the first LCP Block, the data passes through a shared MLP.

The dimensions of each layer in the network are 128, 256, 512, and 1024, respectively. The input consists of 40960 points.


Experiments


To demonstrate the efficacy of our proposed PL-LCP, we evaluate 3D semantic segmentation results on both indoor and outdoor scenarios using two large-scale point cloud datasets. First, we do two ablation experiments to validate the ability of the LCP module to integrate inter-block information and the effect of pseudo-labels. Then, our method is compared with other relevant approaches, primarily to demonstrate the effectiveness of the PL-LCP network architecture. Our experimental environment is: Intel Core i7-8700 CPU (3.70GHz), 64GB RAM, NVIDIA GeForce RTX 4090 24GB GPU, 64-bit Ubuntu 22.04.3 LTS Operating System (5.4.0-149-generic).

We trained the network for 200 epochs using the Adam optimizer with momentum, batch size and weight decay set to 0.9, 4 and 0.0001, respectively. The initial learning rate was set to 0.01, and decreased by a factor of 10 at 120 epochs.
Here we show some of the algorithms which have already been tested with the corresponding results:

Part LCP OA(%) mAcc(%) mIOU(%) Labels
1 90.2 74.3 67.6 fully
2 × 87.6 74.5 64.6 fully
3 90.1 74.4 67.1 10%
4 89.2 73.2 65.9 1%


Performance comparisons with existing sota methods on SensatUrban test set
Methods OA(%) mIOU(%) ground Veg. buildings walls bridge parking rail traffic street Cars path bikes water
PointNet 80.8 23.7 68.0 89.5 80.0 0.0 0.0 4.0 0.0 31.6 0.0 35.1 0.0 0.0 0.0
PointNet++ 84.3 32.9 72.5 94.2 84.8 2.7 2.1 25.8 0.0 31.5 11.4 38.8 7.1 0.0 56.9
TrangenConv 77.0 33.3 71.5 91.4 75.9 35.2 0.0 45.3 0.0 26.7 19.2 67.6 0.0 0.0 0.0
SPGraphr 85.3 37.3 69.9 94.6 88.9 32.8 12.6 15.8 15.5 30.6 23.0 56.4 0.5 0.0 44.2
SparseConv 88.7 42.7 74.1 97.9 94.2 63.3 7.5 24.2 0.0 30.1 34.0 74.4 0.0 0.0 54.8
KPConv 93.2 57.6 87.1 98.9 95.3 74.4 28.7 41.4 0.0 56.0 54.4 85.7 40.4 0.0 86.3
RandLA-Net 89.8 52.7 80.1 98.1 91.6 48.9 40.8 51.6 0.0 56.7 33.2 80.1 32.6 0.0 71.3
PL-LCP(ours) 93.9 67.3 83.5 98.7 96.3 72.3 84.2 57.0 46.9 74.5 54.9 90.1 43.5 0.0 72.8
Performance comparisons with previous methods on S3DIS
OA(%) mAcc(%) mIOU(%)
PointNet - 23.7 41.1
TragenConv 82.5 63.2 52.8
SPGraph 86.4 66.5 58.0
LocalTransformer 87.6 71.9 64.1
RandLA-Net 87.2 71.4 62.4
PSNet 87.8 - 64.9
PL-LCP(ours) 90.2 74.3 67.6
If you want to show your test result here, please send your e-mail to the following e-mail address: Thanks for your support!


About us

If you have any questions or advice, you can contact us through following address: This work was jointly supported Natural Science Foundation of Hubei Province,China (2022CFB727) and National Natural Science Foundation of China (42301507).