Model Data Using Constraints, not Types
Data in many languages and databases has types. For example, in C++, we might write:
class Order {
unsigned int num_items_ordered;
float discount_percentage;
int address_pin_code;
}
I think a better style is based on constraints:
class Order {
num_iterms_ordered >= 1 && integer;
discount_percentage >= 0 && <= 100%;
address_pin_code >= 0 && <= 999999 && integer;
}
Why is this better?
First, it puts the focus on what matters more. For example, if you have a discount percentage of 150%, it’s a problem, because it means you’re paying people to order. Whether it’s an integer (150%) or float (150.5%) is secondary. But when you think in terms of types, your attention is focused on this less important aspect, while missing the important one.
Second, you should think in terms of your business logic, and leave it to the language to translate that into implementation details like how many bytes are used to store the integer. You should think top-down. Not bottom-up in terms of implementation types 1. In fact, that’s the whole reason why we use high-level languages — to think in terms of business logic and not the hardware.
Third, constraints can model data more accurately than types. For example, saying that num_iterms_ordered is an unsigned int still leaves open the possibility that it’s zero, which implies that the customer can order zero items, which doesn’t make logical sense. Programming languages don’t have a positive integer data type. As a second example, if a petrol station models the amount of fuel dispensed as a float, that permits a negative amount of fuel dispensed. Programming languages don’t have an unsigned float data type. As a third example, it’s better to say that the pin code should be <= 999999 than to say that it’s stored using a certain number of bits, because the constraint is in terms of decimal rather than binary digits.
This concept applies not just to programming languages, but also to databases. I’d rather write:
Table orders {
num_items_ordered >= 1 AND integer
discount_percentage >= 0 AND <= 100%
pin_code >= 0 AND <= 999999
}
than:
CREATE TABLE orders {
num_items_ordered INT NOT NULL,
discount_percentage FLOAT NOT NULL,
delivery_address STRING NOT NULL
}
In the first case, the constraints implicity say that it’s not null, because null would not pass the constraint of being greater or lesser than a threshold.
You might wonder what would happen if you’re writing performance-sensitive code where the implementation details matter. You can still handle that using constraints. Instead of
short x;
you can write:
x >= -32768 && <= 32767;
This gives the compiler all the information it needs to model it in two bytes.
So constraint-based modeling does not mean sacrificing performance.