Thursday, November 22, 2012

varchar vs nvarchar (and text, ntext, char, nchar)

Everyone who's ever created a table in SQL has probably at some point wanted to add a field to store text values and wondered about the difference between all the various options. These are numerous. The most commonly used are varchar, nvarchar, char, nchar, text, and ntext. You could also store text in binary, nbinary, or image columns. Actually you could store text in almost any data type if you're willing to put yourself through the hassle of very complicated conversions.  Let's stick with the original six for now: varchar, nvarchar, char, nchar, text, ntext.

The n prefix denotes unicode (more about this later), so ignoring that, we're dealing with three data types here: varchar, char, and text. Varchar is by far the most commonly used and the rest of this article will focus on this type, but in short these three can be differentiated as follows:

  • char: Fixed length string. In other words, char(4) will always have four characters. Thus if you want to store "A" it will become "   A" (A padded with 3 spaces). This is not used very often, but can be useful in certain scenarios. 
  • text: This stores the data in a blob (i.e. not in the row data of the SQL table). This is useful when you expect to store huge text values as there's no limit on the size of the data. However, this type is being deprecated and is being replaced (sort of) by varchar(max).
  • varchar: Up to 8000 characters this stores text in the row data (similar to char) but not always of fixed length. In other words if you create a varchar(4) column and store a value "A" in it, this will use just one byte to store the A. AA will use two bytes, AAA will use 3 and so on. This implies that the length of various rows in the table will vary from one row to the next. varchar always uses an additional 2 bytes for storage, essentially for meta data so that SQL server knows how much data to expect in each specific instance. That means if you know you will always need exactly 4 characters, char is much more efficient in terms of storage requirements and processing overhead, since it doesn't need the additional 2 bytes of meta data nor any processing power to interpret that meta data. Having said that, in the majority of modern applications, this type of gain from using char instead of varchar is trivial. In the olden days when memory, processor cycles, and storage space were more limited, this was a serious consideration though. In previous versions of SQL Server, varchar could only go up to 8000 characters, but as of SQL 2005 varchar(max) will allow you to store any amount of data. This works like varchar described above up to the 8000 character limit and once data size goes over this it switches over to a blob to store the data and behaves more like the text data type (presumably this is also why the text data type is no longer necessary and being deprecated).
Now, on to varchar vs nvarchar. These two types are essentially the same, except that varchar is non-Unicode and nvarchar is Unicode. What does this mean?

Unicode is a standard for encoding text that was specifically designed to allow for storage of a larger number of different characters and character sets to cater for the requirements of different languages. The wikipedia article for Unicode describes this in more detail.

In SQL Server, this means nvarchar supports multiple languages. If your data is only going to be in English, this is not necessary. If there's any chance that your data will contain other language data, especially languages using non-English character sets, like Arabic, Russian, etc., etc. then you will need to use nvarchar. The cost of using nvarchar is that it almost doubles the amount of storage used to store the data as it uses two bytes for each character of data, where varchar uses only one byte per character. Both have the same overhead of 2 characters, so effectively varchar will use 3 bytes to store "A" and nvarchar will use 4 bytes for the same thing. To store "QWERTY", six characters of data, varchar will use 8 bytes (2+6) and nvarchar will use 14 bytes (2+6*2). 

My recommendation: If you're absolutely 100% sure that you will not need non-English data, and you will not need it in the foreseeable future either, then use varchar. In all other cases, use nvarchar.

Monday, November 12, 2012

Software I can recommend

This list isn't meant to be exhaustive, but I wanted to highlight a few things I use regularly that I can highly recommend. I also use a lot of other stuff (like MS Office) that is really good, but most people already know about those, so these are the lesser known ones that I think is worth a try:

  • Pixlr: Great for quickly editing a picture or logo. Works like a charm, completely web-based, no install necessary. And it's free.
  • VIPRE: Vipre is my anti-virus of choice. Over the years I've tried various products including TrendMicro, AVG, Norton, and McAfee, but VIPRE stands out. I haven't had any issues with viruses since I started using it about 2 years ago. It's really lightweight and resource-friendly compared to some of the others, and I've been very impressed with their responsive support, especially on chat and e-mail.
  • Xara: Not one of the mainstream design software names, but very good and competitively priced. I haven't used it very extensively, but from what I've seen it does the job and comes in much cheaper than it's main competitors. I've also received good support from them on something that they really didn't need to help with.
  • Google apps: Perhaps a bit too mainstream for this list, but it's so good that I really have to mention it. Xpedia started using Google apps a few years ago and I cannot recommend it highly enough. I use forms, spreadsheets, and sites regularly and I simply don't think there is an easier/cheaper way of doing what you can do with these tools. The mail is fantastic too. A lot of our users still use Outlook, but I don't recommend this because the Gmail web interface is just so good that it isn't necessary to have a mail client anymore. While Word/Excel are more feature rich than the comparative tools in Google apps, the ease of collaboration and sharing in Google apps counters this nicely. I use a combination of both, so it depends on what you need to achieve, but I think a lot of organizations can get great value from Google apps. The aforementioned tools are just a fraction of what you get when you sign up for Google apps though, so well worth checking out. And hey, if you're small enough, it's free!
  • Spybot S&D: Spybot is a great tool for keeping your PC free of malware and spyware. I've been using it for years and it really is a great project. Easy to use and works well with most AV products.
  • Mozy: I've been using Mozy's backup service for years and the backup part has always impressed me. It's easy to use and set up and doesn't interfere with your daily work (i.e. not a resource hog). Then I had a crash recently and had to "test" restoring. This could not have been easier. What a pleasure. All my backups, 100% up to date, available easily and immediately.
  • Red-Gate: More specifically Red-Gate's SQL tools. The ones I use often are Compare and Data Compare, but every one of their tools that I've tried has been simply fantastic. Easy to use software that just works. For people in the MS SQL world this is probably also considered mainstream, but in case you haven't tried it, do yourself a favour.
  • WinRar. Yes, I'm one of those people who actually bought a compression agent! WinRar is really good though. Supports a huge list of compression file formats, works quickly and well.

Monday, November 5, 2012

The cloud: Hype or real?

The ever-increasing hype about the cloud is getting to me. My personal experience is that a lot of people in our industry love throwing around the buzzword, but simply have no idea what it means. To dispel some  myths, I recently (ok it was months ago, but I'm only getting around to blogging about it now...), did some research and gave an internal presentation at one of our company functions.

I've made the slideshow accompanying my presentation available for download here. Some of the slides are very specific to our business and these were designed to accompany my talk so it isn't intended as a comprehensive narrative in itself, but hopefully something that can be of use to readers anyway.

The highlights:

  1. The term "cloud" actually comes from network diagrams that used clouds to represent portions of the network that did not need to be known for purposes of the diagram. In this context "cloud" means something like "Unknown magic happens here".
  2. In most modern contexts "the cloud" typically means the Internet or some subset thereof.
  3. A lot of people seem to refer to cloud when they actually mean cloud computing. This can be defined as data, services, or applications hosted on anonymous Internet-based infrastructure. Peder Ulander of Citrix gives a more comprehensive and pretty good description in this video: http://www.youtube.com/watch?v=C4krCK1PZHY
  4. Cloud computing can be categorized by service type as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS). 
  5. While PaaS (e.g. Azure, Amazon Web Services, etc.) is fairly new, IaaS and SaaS are as old as the Internet itself. IaaS is really nothing but hosting services and web hosting companies have been doing this for years. It has changed and evolved over time though and there is a lot of new stuff out there in this space, but it's still the same concept. Earliest SaaS examples are probably web mail services like hotmail. This field is growing very quickly with ever more advanced apps. Good examples of business applications in this area are Salesforce.com, Netsuite, and even our very own NuQ HR/Payroll system (see www.nuq.co.za).
  6. Cloud computing has advantages and disadvantages. 
    1. For me, the biggest advantage is not having to worry about infrastructure. Your application can scale with your business without you even having to think about it, much less buy dedicated hardware and software to support it. This is based purely on my personal experience and probably more relevant to SaaS than IaaS or PaaS.
    2. The biggest disadvantage (again only in my personal experience) is that support is often impersonal. Services are designed to scale well and that means personal service is usually not a priority. When everything works well this isn't an issue, but when you do have a problem that really affects your business this can be extremely frustrating.
In conclusion, the cloud is just hype, but it's also real. 

It's hype, because it's existed for far longer than the cliched terminology that the buzzword brigade so love to use today. There's nothing new about the concept and it's simply an ongoing evolution of services available on the Internet. It's also very real, because this evolution has reached a point where the services on offer are becoming rapidly more useful in our day-to-day business and personal lives.

In my experience when you meet someone who goes on about the wonders of the cloud, this is a good indication that they've heard the hype, but don't understand what it is. In most cases you can replace "cloud" with "Internet" in their every sentence.