Heterogeneous distributed big data clustering on sparse grids (Q2312425): Difference between revisions

Summary: Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big data scenarios, a high-performance clustering approach is required. Sparse grid clustering is a density-based clustering method that uses a sparse grid density estimation as its central building block. The underlying density estimation approach enables the detection of clusters with non-convex shapes and without a predetermined number of clusters. In this work, we introduce a new distributed and performance-portable variant of the sparse grid clustering algorithm that is suited for big data settings. Our computed kernels were implemented in OpenCL to enable portability across a wide range of architectures. For distributed environments, we added a manager-worker scheme that was implemented using MPI. In experiments on two supercomputers, Piz Daint and Hazel Hen, with up to 100 million data points in a ten-dimensional dataset, we show the performance and scalability of our approach. The dataset with 100 million data points was clustered in 1198s using 128 nodes of Piz Daint. This translates to an overall performance of 352 TFLOPS. On the node-level, we provide results for two GPUs, Nvidia's Tesla P100 and the AMD FirePro W8100, and one processor-based platform that uses Intel Xeon E5-2680v3 processors. In these experiments, we achieved between 43\% and 66\% of the peak performance across all computed kernels and devices, demonstrating the performance portability of our approach.

0 references

zbMATH Keywords

clustering

0 references

machine learning

0 references

distributed computing

0 references

performance portability

0 references

GPGPU

0 references

OpenCL

0 references

peak performance

0 references

describes a project that uses

k-means++

0 references

DENCLUE

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.3390/a12030060

0 references

cites work

A local search approximation algorithm for k-means clustering

0 references

Q2934696

0 references

Q3286740

0 references

Spatially adaptive sparse grids for high-dimensional data-driven problems

0 references

A New Subspace-Based Algorithm for Efficient Spatially Adaptive Sparse Grid Regression, Classification and Multi-evaluation

0 references

Q2730466

0 references

From Data to Uncertainty: An Efficient Integrated Data-Driven Sparse Grid Approach to Propagate Uncertainty

0 references

Locality-sensitive hashing scheme based on p-stable distributions

0 references

Identifiers

zbMATH Open document ID

1461.68210

0 references

DOI

10.3390/a12030060

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2312425

@@ Property / full work available at URL @@
+https://doi.org/10.3390/a12030060
@@ Property / full work available at URL: https://doi.org/10.3390/a12030060 / rank @@
+Normal rank
@@ Property / OpenAlex ID @@
+W2912733649
@@ Property / OpenAlex ID: W2912733649 / rank @@
+Normal rank
@@ Property / cites work @@
+A local search approximation algorithm for k-means clustering
+Normal rank
@@ Property / cites work @@
+Q2934696
@@ Property / cites work: Q2934696 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3286740
@@ Property / cites work: Q3286740 / rank @@
+Normal rank
@@ Property / cites work @@
+Spatially adaptive sparse grids for high-dimensional data-driven problems
+Normal rank
@@ Property / cites work @@
+A New Subspace-Based Algorithm for Efficient Spatially Adaptive Sparse Grid Regression, Classification and Multi-evaluation
+Normal rank
@@ Property / cites work @@
+Q2730466
@@ Property / cites work: Q2730466 / rank @@
+Normal rank
@@ Property / cites work @@
+From Data to Uncertainty: An Efficient Integrated Data-Driven Sparse Grid Approach to Propagate Uncertainty
+Normal rank
@@ Property / cites work @@
+Locality-sensitive hashing scheme based on p-stable distributions
+Normal rank