§ April 22, 2008 15:55 by
beefarino |
Yesterday morning, my colleagues and I we were having a discussion about all the different string representations and abstractions we've had to work with over the course of our lives as programmers. Here's the list I came up with, in rough chronological order of my exposure to them:
- TRS-80 BASIC - I have no idea how they were represented in memory because I've never gone back to that platform, but I assume it was just another character array. If someone knows the specifics I'd LOVE to learn about it.
- Borland Turbo PASCAL character arrays
- C/C++ char* / char[]
- Win32 LPSTR, LPCSTR, and all the other constructs from the Win32 API that I had to learn about because Owen and Michael would insist on leaving STRICT defined (thanks guys!).
- MFC CString
- Javascript string objects
- Perl $scalars - a string, a number, a reference, or all three, or perhaps none of those. I once dug deep into Perl internals; I could tell you more than you care to know about scalars, memory management, type conversion inside of the Perl interpreter; of course, show me some of the perl I hacked up a few years ago and I won't be able to tell you what it does...
- C/C++ wchar* / wchar[]
- Win32 TCHAR* / TCHAR[] - yes, technically the same as either char* or wchar*, thanks for not commenting about it.
- BSTR - WTFBBQ?! OIC - its a pointer to the MIDDLE OF THE FRACKING STRING STRUCTURE so I have to do pointer calculus to determine the length of the string and suck out the relevant bytes.... well, thank goodness there's:
- __bstr_t - ok, a bit friendlier, but I'm still glad that there's:
- CComBSTR - ah, a Length() method!
- VARIANT - *sigh* .... the lengths to which I went to pacify OLE Automation lust.
- __variant_t - ignored in favor of:
- CComVariant - use only when necessary, follow each use with a thorough handwashing.
- PHP strings - never learned the internals of PHP. I assume it operates on the same type of abstraction as the Perl scalar - anyone know for sure?
- Java string objects - took some getting used to. Why can I + two strings, but not two Matrices? How does it make sense that a base Object return an instance of a derived type String in Object.toString()? See what happens when an active mind is no longer consumed with memory and pointer management?
- .NET string objects
That's what I came up with in about 5 minutes of gazing longingly over my geek life. I'm sure there are others - my list doesn't include all of the one-off custom implementations I've made, or the third-party tools we used to use for cross-platform application development, or stuff like XML tokens, entities, etc.
I created the list for fun, but it's got some pretty interesting aspects to it. For one, the same basic string construct that I learned on that TRS-80 never really changed. Sure, they're immutable objects now, but really their representation and purpose have persisted since my dad brought home that fat grey box with the tape drive and keyboard that sounded like a hole punch.
Second, it's made me realize that I take a lot of stuff for granted these days. Here's some code from somewhere between #8 and 9 on my list:
TCHAR *psz = new TCHAR[ iStringSize ]; if( NULL == psz ) { return E_OUTOFMEMORY; } // ...
delete[] psz; psz = NULL;
Even writing this as an example makes me very nervous. A few years ago I wouldn't have batted an eye, but these days it feels like a lot of work to pull all of the allocation, pointer management, and deallocation together. And this example really doesn't account for all the things that could go wrong...
So I have to say that, all strings considered (*groan*), I'm pretty content with the state of the art.