Straws in the Wind Blog Articles
• Silence is Golden?
• National Broadband Network, Spatial Data & Processing
• Industry Best Practice?
• Spatial Database Independence
• 2011 Oracle Spatial Excellence Award for Education and Research
• Tiling Very Large Rasters
• Cloud Computing GIS and Standards (OGC/ISO)
• Usefulness of Spatial Metadata as a Foundation for an Australian data.gov and other uses
• Vale Professor Pieter Roelof Zwart 1941-2010
• Interview by Nestoria on Real Estate Mapping
• Mapping surface area of a ruptured pipe in Oracle Spatial
• FOSS4G 2009 Sydney Presentation
• GIS software and Database Primary Keys
• To Constrain or Not to Constrain: There should be NO Question
• The Shapefile 2.0 Manifesto
• Maps of War Website
• Talk on Open GeoData in Australia
• Boarder and District Spatial Information Group Presentation on Spatial Datbases
• Presentations given by myself at the Australian Oracle Spatial Forum, Sydney, Thursday 28th August 2008
• The Sad State of SQL Spatial Standards - Take 2
• Radius Studio and ESRI (Part 2)
• The Sad State of GIS SQL Standards
• Microsoft to release their own spatial capability for SQL Server
• Radius Studio and FDO
• SpatialWare 4.9 Released
• First Radius Studio Certified Practitioner
• Image Catalog Tool - How To Videos
• Latest article published on Directions Magazine
• Image Catalog / GeoRaster Management Tools
• ESRI Ireland - Many Thanks
• PL/SQL Packages for Oracle Sdo_Geometry
• Professor Hanan Samet
• ADF and Spatial
• Bouquets and Brickbats
• Geomatics Degrees, Space Curves and Oracle Spatial
• Non-Persistent Types
• Feature Data Objects - Either/Or?
• A Thank You
In my consulting travels I, like many other database-loving specialists, find databases with little (a Primary Key, perhaps) or no constraints being applied to the data model managed by a relational database (which may be Oracle, SQL Server, PostgreSQL, MySQL etc).
I finalised a report for a Government Department here in Australia recently where this was the case in their Oracle databases so I included the following comment in my final report expecting that it will be ignored:
I also recommended that database constraints (documented elsewhere) be applied to their data models.
I have found such issues, and many similar observations and recommendations many times in the past, with other clients but such recommendations rarely get implemented. The reasons for this probably relate to database technology being treated as a “foreign tool” to all but the IT department’s DBA or specific application data modeler. Certainly databases are not something IT programmers or GIS-professionals understand: if they have to work with a database GIS professionally mostly seem to prefer to do so via some GIS tool like ArcCatalog. The database, regardless as to reason, is the poor cousin in the data management stack of most applications.
While this article is, strictly speaking, not about GIS, some GIS products do what other commercial applications in the normal IT-space do: that is deliberately not implement constraints for the data models using the host databases constraint system. Rather they prefer to implement their model constraints in their own database (metadata) tables and apply the constraints in their own middle or client-tier technologies.
In all cases the assumption seems to be made that:
But this doesn’t justify a separate metadata system. Why not constrain the database’s data model (say using the ANSI standard INFORMATION_SCHEMA) and build applications rules on-the-fly by “mining” this catalog?
(Sure, some things will be missing, eg spatial “referential integrity” constraints like “pipe falls at end of pipe”, but it is only these that should be stored in custom metadata tables.)
Before I go any further. I don’t like “either”/“or” arguments. Data quality is not about a single process such that model specific constraints are applied only at a single level eg application. Data quality should be the goal of the whole of an application’s technology stack.
But having said this almost all we do in scientific computing is collect data within one application but modify, manage, analyze and present it with other applications. The common foundation on which all application activity starts and ends is usually a single (independently) quality controlled database. If there is any tier in the stack that should be clearly and fully documented and constrained it is the DATA TIER. Or another way:
I will try and explain the reason for constraining from two perspectives:
1. Relational Database Theory
What is meant by “self-referential” is that an application should not be required to impart meaning to the data being stored and managed. To this end all commercially available database software such as SQL Server and Oracle include a system “catalog” that controls and documents all data that is stored within them. This metadata documents all tables stored in the database, their column names and types, the valid data values they may contain, any indexes that may be defined on sets of columns and the relationships between tables. The catalog also controls access to these tables by applications and users.
This approach to database management removes the need to ask a programmer to access application source code to answer basic end-user questions relating to database structure, quality and integrity.
Database metadata is both descriptive and prescriptive. Thus the definition of a “rule” controlling data values in a specific table/column – e.g., a column describing a percentage value should always fall “BETWEEN 0 AND 100” – succinctly describes data quality but, once applied, will stop values other than that defined from being entered.
This was particularly evident during a discussion I once had with staff in an organization about the range of correct values for a SURFACE_LEVEL column in table called PITS. The range of values actually held by the column were far greater than expected by staff. Examination of the associated DATUM column shed no light on the small number of exceeding large values. In fact, the DATUM column contained mixed case values (e.g., “AHD” and “ahd”) and a variety of possibly identical references (is the “AGD/AHD” supposed to be the same as “AHD” and “ahd”?). These columns should have constraints similar to the following defined and applied:
Once constraints such as these have been defined, applications should use these rules when providing access to the database through “mining” the metadata: this “data-driven” approach to application development provides the most flexible method for building and deploying applications that use relational databases. This approach enables a “Model Driven Architecture” (MDA) approach to software technologies that reduce application development time through removing the need developers to implement basic “rule driven” data entry and validation.
2. Database tuning and performance
Normally, this is asserted to be a “database best practice”. But such an “assertion” is hard to prove as there is little in the way of a collated Database Book Of Knowledge (DBOK) on the Internet whose findings are based on empirical evidence. This may be because of the complexities of individual data models that make it hard to create standardized model-independent use cases. Add to this the complexities of the different commercial databases’ query optimizers and architectures. Finally, even if a database was constrained and those constraints provided improvements to database performance who is to know other than the DBA and maybe the odd programmer or user. Databases are hidden, unsung heroes of any organization: no one blows their trumpets.
So, imagine my pleasure when I got back from some consulting interstate to see the May – June 2009 edition of the Oracle Magazine contained a great article on just this issue by Tom Kyte called On Constraints, Metadata, and Truth.
If there is one person who, for Oracle professionals, provides a walking “book of database best practice” it is Tom Kyte. Tom always bases his recommendations on solid, empirical research. So, it is with pleasure that I let Tom conclude this article on how a constrained data model can aid database performance.
In case you think that contraints help performance only for Oracle see Sarvesh Singh’s article showing how Check and Foreign Key Constraint Improves Query Performance for SQL Server.
I hope someone finds this article useful