The Rise and Rise of SQL
Posted
Structured Query Language (SQL) is one of the longest surviving languages, being first described in 1973 and made an ISO standard in 1986. I first used it in 1992 and was enamored with its simplicity and elegance. In the fifty years since inception, numerous other languages have appeared and flamed out. What makes SQL so successful? What is the secret to its longevity?
For me, the fundamental reason is that SQL is a functional language. When we construct a query in SQL, we don’t tell the machine how to get the data. We tell it what we want. Functional languages are inherently agnostic to implementation, a fact demonstrated by the vast number of SQL speaking database engines out there which, under the covers, all have their own forms of data storage and indexing.
Perhaps another reason for longevity is that SQL is one of the few languages I’ve seen business users work with, helped by the fact that it doesn’t require compilation, uses a familiar table and column metaphor, English like language, and starts very simple:
select *
from customers
A final reason I would suggest for longevity is that the database systems that implement SQL typically expose their data dictionary to SQL as well. So, the same language that is used for querying data is also used to learn about the structure of the data. For example, the following query in Snowflake tells me how many columns there are in the table customers:
select count(*)
from prod.information_schema.columns
where table_name = "customers"
and schema_name = "prod"
Despite the success of SQL, for most of its history, it was only typically used within the boundaries of an enterprise. That is, one would rarely let a customer or partner have direct access to the database to run their own queries. There are good reasons for this:
- Typical SQL network protocols are not designed to be inherently secure and used across the Internet
- Databases typically suffer from the ‘noisy neighbor’ problem: that is, a badly written or expensive query written by one user causes performance problems for other users
- Some queries are expensive to compute. Since we’re paying for compute, we must prevent abuse (inadvertent or otherwise) of the commons.
To protect ourselves, for the last twenty years we’ve wrapped our databases with intermediate layers that expose secure APIs to our customers. That way we can monitor and control what the customer is doing and more easily throttle excessive use.
However, in the last couple of years, these assumptions have been challenged. Companies like Snowflake introduced new cloud native architectures that separate compute and storage costs, essentially charging by the query. This gave rise to marketplaces that solve issues two and three above: in a modern marketplace, the supplier pays for storage and the customer pays for compute. There is now moral hazard to a customer: run an expensive query and you must pay for it.
I believe that read only SQL sharing of data, within a marketplace environment, with customers will soon supplant APIs for data vendors. In a subsequent post, I’ll explore why.