When defining datatypes in a database, I have always had a problem with choosing whether to use integers or strings to store certain \'numerical\' data.
Say I am bui
For a postal code I would choose a string. It is not intrinsically an integer. It is just an identifier for something and it could just as well have been a series of four characters.
As for the number of files inside a torrent, that should be an integer.
I see no problem with storing a zip code as a number even if you don't expect to perform math operations on it.
In our corporate data warehouse, we are the recipients of data from many legacy systems. As a result, we see a lot of garbage data being used.
Take our case where we have a Geographical identifier that is a zero filled 4-digit "numeric" value. This field is often used to join tables together.
I would take one of two approaches: 1) declare the column as a char field of length 4 and add a CONSTRAINT LIKE '[09][09][09][09]' 2) define it as a numeric length 4 and, if the users want it, format the value WHEN DISPLAYING only.
Approach numeric 1 saves you the hassle of constantly formatting, which is no big deal, but if you are often filtering and even indexing/joining on the column, I'd consider saying that we're off with option #2.
A third reason is that my experience is that people are just plain lazy when it comes to adding constraints to a database or they are ignorant. I think it is more laziness, personally. I find the constraints that do exist are mostly applied as edits in the application which originally captures the data and these that edits are not applied uniformly.
As a result, our data warehouse ends up receiving all sorts of variations including inconsistant pre-filling with zeros or justification of the value.
When you define something as an INTEGER, you automatically get more efficient storage, esp. when indexing on the column, and and edit which everyone understands and is more likely to be applied consistently across legacy systems by database designers of various abilities.
I have no problem with option #1, with the exception of using the field in an index and my concern over the approach of once you accept a field as being an apha numeric, people tend to throw more junk into it.
Take for example, our Peoplesoft employee identifier. Somebody decided to add an "X" in front of an employee 6-char zero filled "number" to designate that the employee is a contractor. This violates a personal practice of mine not to combine separate pieces of information into a single field. This caused all sorts of inconsistency problems across various systems. If this field were a numeric, no one would've tried to do that.
Comments?
You should only use numeric fields if you have to perform arithmetic operations with that fields. Otherwise just go with string/varchar/etc
This is a question of semantics. You are trying to decide the appropriate datatype for storage which can be a tricky question. The best rule of thumb is to store your data as integers if you will need to use the data as an integer.
In other words, since you will never be using a postal code as a number it does not make sense to store it as one. It doesn't matter what the data looks like, it matters what it is. Is a postal code a number? No, it's a string of characters that just happens to be made up of wholly numeric characters. Therefore a postal code is best stored as a string.
Post code is not a number: it's a code or identifier. The same applies to phone numbers.
Number of files in a torrent is an integer.
Not least, in this case you can create a CHECK CONSTRAINT LIKE '[09][09][09][09]'
to keep data correct at the database level.
I always use the following rule:
If you plan on performing mathematical calculations on it (adding/subtracting/etc) make it an integer or other numerical data type.
If you do not plan on performing any types of mathematical calculations on the field, store it as a string.
In the instance of Zip codes, you should never have a time where you need to add to a zip code, or subtract, or multiply two zip codes together. Mathematical functions generally are not used on ZIP codes because they are used as identifiers and not quantities. Therefore you should store your zip code as a string datatype