GIScience 2020 Workshop


Towards universal segmentation of geodata

Format: Half day, workshop with limited tutorials

During the last years, we experience a real flood of spatial data resulting from satellite missions as well asmanually generated data. It means that data processing requires substantial dimensionality reduction to make dataunderstandable to humans. Segmentation of geodata is an essential step towards data classification wherecomplex patterns are reduced to larger homogeneous objects. Segmentation by itself is nothing more thanclustering (information space) with additional spatial constraints (Cartesian space). The segmentation processtakes as an input spatial data, which exist in many different forms, and assigns a new category (a segment label)to each unit of input data. Proper segmentation attempts to minimize internal differentiation and maximize thespatial coherence of segments. The definite balance between internal and spatial coherence is unknown, and thenumber and size of segments is a combination of the number of objects subject to segmentation that makes theentire problem NP-complete. Nowadays, segmentation of geodata is mostly associated with the segmentation ofremotely sensed images where the basic unit is a pixel, and its signature consists of values of image bands atpixel position. The decision whether two pixels belong to the same group depends on the similarity/dissimilaritymeasure, usually in the form of Euclidean distance between two vectors of features.

Since 2011 Space Informatics Lab has been developing a new technology in which the process ofanalysis involves a similarity between motifs or content of larger raster units called motifels. Each intricate motifis converted to a much more simple, rotationally invariant signature, usually in the form of a sparse histogram. Each bin of such a histogram represents not only a category but also its spatial relation to other elements ofmotifels. We also adopted several similarity measures to work efficiently with such histograms. This way, wewent through from pixel-based to pattern-based segmentation. Later we added possibilities to work with otherrepresentations of data, e.g., time series, map collections, and DEMs, and released a set of tools addressed to adifferent type of data called GeoPAT. We also introduced a transformation of grid topology from default quadratic (like in all raster data) tobrick-wall topology (similar to honeycomb). In last year, we have extended that idea to work with topologicalstructures of vector data like points, polygons, and line networks.

Based on our 10-year experience of working with different areas and representations of geodata, we canform three simple rules which advance to universal segmentation of any type and content of geodata:

  • Each unit subject to segmentation must be represented in the same form, i.e., sparse vector;
  • Similarity measures should be scaled to the same range, preferably [0,1], where 0 means no similarityand 1 means identical;
  • Dataset subject to segmentation does not need to be regular (like raster) but must have the topologicalstructure: each object must know its neighbors.

It means that the key point of universal segmentation is representation of data as a graph. Each node of the graph represents a spatial unit (pixel, motifel, time series, point, polygon etc.). Weights of arcs between nodesrepresents similarity between units and node signature represents pattern contained within the unit. The entireprocess requires at its beginning a set of converters that gathers data provided in the standard GIS form andtransform it into universal graph. A Segmentation algorithm operates on the structure of graph. As a result weobtain segment labels that are finally assigned to the original data. The converters or, more precisely, algorithmsthat convert motifs inside the data units are subject to our research, and we hope to discuss them during theworkshop extensively.

Primary contact:

Participants: 10-25


Opening remarks (30 minutes – Tomasz Stepinski, remotely):

  • What is and what is not segmentation of large geospatial datasets?
  • What is motif of data and how to represent it?
  • Concept of similarity between patterns.

GeoPAT Basics (1 hour, Netzel, Jasiewicz, Nowosad, Niesterowicz):

  • Data preprocessing to universal grids.
  • Process of segmentation and its parameters.
  • Interpretation of the results.
  • Canadian forestry data
  • Segmentation of climate


Representation of complex data using patterns (1h, Nowosad, Netzel):

  • What is a complex pattern?
  • Representation of complex patterns on a single map layer.
  • How to represent patterns for several map layers (such as land cover, soil types, land forms) simultaneously?
  • How to measure distance between time series in spatio-temporal data.
  • Applications of complex data patterns.

Segmentation of vector data using Python (45 min Jasiewicz, Ośko):

  • How can we segment vector data using the same principles for raster data?
  • Accessing topology in Python environment.
  • Steps of segmentation.
  • Practice.
  • Discussion.

Discussion of results across different types of data (15 minutes, Netzel, Jasiewicz, Nowosad, Niesterowicz):

  • Three pillars of segmentation: data structure as graph, nodes as signatures, arcs as a similarity.
  • Similarity and its impact on final results.
  • Extensibility and future ideas.


Published on  March 10th, 2020