On the PostGIS discussion list, a question was asked:
> I have a polygon table that has many small areas and holes. Now, I would
> like to remove small areas and holes that are 2800 m^2.
>
> Any help or advice would be really appreciated.
I decided to have a bash at this as part of my continuing to learn PostGIS.
This article will firstly look at a simple (single) polygon (I will use WKT to construct a polygon with two inner holes) which will make it easy for my readers to replicate. Then it will consider how to handle situations where a supplied polygon has no holes (inner rings) and how to handle polygons and multipolygons together before concluding with some performance figures.
1. Single Polygon with Inner Rings (holes)
The function that can help us do what we want is the ST_DumpRings(geometry) function. Here we can see how the function, applied to our test geomety, will extract all the rings for processing:
SELECT ST_AsText(b.the_geom)AS final_geom, ST_Area(b.the_geom)AS area
FROM(SELECT(ST_DumpRings(a.geom)).geom AS the_geom
My first attempt to use ST_DumpRings to achieve the required outcome resulted in the following SQL (note how we can apply our area filter to the extracted rings):
SELECT ST_AsText(c.the_geom)AS final_geom
FROM(SELECT ST_Collect(b.the_geom)AS the_geom
FROM(SELECT(ST_DumpRings(a.geom)).geom AS the_geom
Note that the output is a MULTIPOLYGON with each ring being a separate polygon in the collection and not a single POLYGON (as was the source data) with inner rings which is what we do want as the result.
Searching the PostGIS documentation indicated that I should be able to reconstitute the rings into a single polygon via use of the ST_Difference aggregate. Here we want to take the outer ring as the first argument to ST_Difference and subtract the set of all inner rings from it:
ERROR: set-valued FUNCTION called IN context that cannot accept a SET
This is because the ST_Collect returns geometry[] (ie a multi-geometry) and not a single geometry. (See documentation note: “Do not call with a GeometryCollection as an argument”)
Are there other solutions?
The ST_MakePolygon function looks useful:
ST_MakePolygon(linestring, [linestring[]])
Creates a Polygon formed by the given shell and array of holes. You can construct a geometry array using ST_Accum.
Though it requires linestring and not polygon arguments which will result in more complex SQL (as is shown below):
The splitting of the SQL into the two halves to extract the outer ring and the inner rings (separately) had to be done because the only method of reconstructing the polygon was via the ST_MakePolygon function and it needs two inputs: a linestring for the outer ring and an array of linestrings for the inner rings left after being filtered by area. Sadly, ST_Collect() on the straight output of ST_DumpRings (with no filtering by path only area) just generates a multipolygon. I tried playing around with ST_Intersection etc but these still return a multipolygon.
Now, this SQL is fine for a single, hardcoded test geometry, but I think it would be messy if we tried to use it to process a lot of polygons in a table. The best way to approach this is to encapsulate the above SQL (a complete algorithm) in a function and then use the function when processing our tabular data.
After submission to the PostGIS discussion list, the greatly skilled and experienced Regina Obe suggested a few small amendments to my SQL and function.
“You should do this:
SELECT (ST_DumpRings(a.geom)).*
Instead of this:
SELECT (ST_DumpRings(a.geom)).geom As the_geom, path(ST_DumpRings(a.geom)) as path
(Which would mean in the upper part you would need to reference by .geom instead of the_geom.)
The reason for that is internally PostgreSQL will call ST_DumpRings twice [cf Deterministic functions in Oracle). This was pointed out to me by a very experienced PostgreSQL developer: His blog entry about it is here”
Implementing this suggestion leads to this change in the filter_rings function.
However, Regina also pointed out a way to shorten the SQL to something like that which I wanted at the start of this article by suggesting:
“I think ST_BuildArea might be better than ST_MakePolygon in this regard. It will work fine with a single closed ring and if multiple, it turns the inners to holes.
What I was thinking of was this.
CREATEORREPLACEFUNCTION upgis_filter_rings(geometry,FLOAT)RETURNS geometry AS
“BuildArea just assumes everything outside is a polygon and everything inside is hole where as ST_MakePolygon takes your categorization of hole/vs shell as gospel.
“The main disadvantage aside from possibly speed over Simon’s is that if you have 3D polygons, I think the above will squash them to 2D where as his approach will support 3D.”
So, the implementation based on ST_MakePolygon fails while Regina’s, based on ST_BuildArea, passes.
The problem with my approach appears to be in the handling of the “ST_Collect“ing (or “ST_Accum“ulating) of the inner rings. Regina again:
“It has to do with this annoying behavior of ST_MakePolygon and hmm ST_Accum that when the second argument is null it nullifies everything and I think ST_Accum is returning null when no result.
Now if you changed your function to this — it would work fine and ARRAY is faster than ST_Accum anyway.
Also you really don’t need the subselects so I took that out as well.”
So, the new filter_rings based on ST_MakePolygon becomes:
Of course, having a filter function for polygons is great, but many real-world definitions of area features involves multiple outer shells. So, we really need an implementation that handles mutli-polygons. Let’s try the following.
In summary, I think Regina’s suggested change to use ST_BuildArea is neater and cleaner than my use of ST_MakePolygon. But my final solution is a combination of both.
4. Performance
What is now needed is some performance analysis of Regina and my approaches especially given Regina’s comment alluding to performance.
I have a dataset which contains some very complicated rural polygons with multiple inner rings so I thought I would test using that. Here are the two test SQL statements:
-- Test SQL 1
SELECT upgis_filter_rings(a.geom,10000)
FROM rural_areas a
WHERE GeometryType(geom)='POLYGON'
ORDERBY random()
LIMIT1000;
-- Test SQL 2
SELECT filter_rings(a.geom,10000)/* , and ms */
FROM rural_areas a
WHERE GeometryType(geom)='POLYGON'
ORDERBY random()
LIMIT1000;
I also recompiled filter_rings to include the fuller multi-geometry handling and ran the last SQL statement again. The relative performance can be seen in the table below.
Function
Run 1 ms
Run 2 ms
upgis_filter_rings
77453
76500
filter_rings
23781
23890
(multi) filter_rings
86563
87625
Clearly the single polygon-handling ST_MakePolygon-based filter_rings performs faster than the ST_BuildArea-based upgis_filter_rings function. Converting filter_rings so that it handles multi-polygons pushes the performance out: this is probably due to the use of ST_BuildArea and the multi-polygon disaggregation/aggregation. I suspect changing Regina’s upgis_filter_rings to be multi-polygon aware would push the performance of that function out even further.
My attempt at converting Regina’s function is:
CREATEORREPLACEFUNCTION upgis_filter_rings(geometry,FLOAT)RETURNS geometry AS
FROM(SELECT(ST_DumpRings(ST_GeometryN(ST_Multi($1),/*ST_Multi converts any Single Polygons to MultiPolygons */
generate_series(1,ST_NumGeometries(ST_Multi($1)))
))).*
) b
WHERE b.path[1]=0OR
(b.path[1]>0AND ST_Area(b.geom)> $2)
) c
) d
$$
LANGUAGE'sql' IMMUTABLE;
Testing against the rural_areas table showed the function ran at 147281 (Run 1) and 144813 (Run 2) ms. Regina was right: upgis_filter_rings is slower because of its use of ST_BuildArea.
Summary
In summary, this has demonstrated to me that PostGIS is very functionally rich in the handling of basic simple features geometries. It has more construction and editing tools than any of the current commercial products though many of those have functionality that PostGIS still does not have.
I hope this article gives, the reader, some idea as to the direction you could take…
Comment [1]
I think you have problem with multi-polygons.
In 3. Multi-Polygons
“ST_BuildArea” is extra in
“SELECT ST_BuildArea(ST_Collect(d.built_geom)) as filtered_geom” row.
Comment [1]
I think you have problem with multi-polygons.
In 3. Multi-Polygons
“ST_BuildArea” is extra in
“SELECT ST_BuildArea(ST_Collect(d.built_geom)) as filtered_geom” row.
— Jevgeni · 9 March 2010, 19:44 · #