October 2007 Archives

AT&T Data Mining Language - Hancock

AT&T has been lambasted quite a bit lately for their willing cooperation with Federal Law Enforcement in Data Mining efforts. Ignoring, for the time being, the moral and issues surrounding Data Mining against mostly innocent people for the purposes of determining Communities of Interest, there is actually quite a large amount of fascinating research that has gone in to this problem space. The initial problem of determining Communities of Interest by tracking who is communicating with whom, is deeply involved in Graph Theory, and I'll be waiting on giving my impressions of that research until a later date. Right now, I'm interested in talking about the programming language that AT&T designed to aid in examining the enormous amounts of data they have access to.

Hancock is a domain specific language designed by AT&T to be an extension to C. It's not too surprising that AT&T chose to base their new language off of C, after all they invented it. Hancock is a stream-oriented language designed for iterating over vast amounts of regular data, based on patterns. Due to it's requirement that data be fairly regular, it's particularly well suited for analyzing information in database, or potentially some log files. While AT&T is using this language to analyze their long distance phone records, internet traffic, and suggest analyzing banking data, I'm going to approach the technology from a significantly less sinister angle, of an e-commerce company who is analyzing what their customers are viewing on their site, for the purposes of populating a "You Might Also Be Interested in..." box on the product pages.

The Hancock Compiler, available free for non-commercial use (I guess that kills my propsed idea, still this is a more academic excercise), is available for download from the AT&T Labs site I linked to above, is functionally a two pass compiler, first it converts any .hc file provided to it from Hancock code to C code, before sending everything (including any pure C files) to the C compiler for final compilation. This means that all of Hancock's special constructs; views, multi-unions, directories, maps, and pickles can potentially be used in normal C programs. The hcc compiler serves as a front-end to the C compiler, and any arguments which it doesn't recognize, will be passed to the C compiler. One interesting option is -C, which causes Hancock to dump the C representation of a .hc file to disk, so you can analyze the file for debugging purposes, or simply to learn how Hancock does what it does. If a person were to create a Open Source version of Hancock, this would be crucial to allowing that work to be done, without viewing the source code for Hancock directly. Of course, AT&T does have Patents on some of the technology used in Hancock, but like Data Mining Ethics, Software Patents will be a discussion for another day.

As Hancock has a very good manual provided by AT&T, I'm not going to write a full tutorial, but I will be posting code blocks in order to highlight some of what I really like about the language, for it's intended purpose. For that code, the following visitRec_t is the type defining the record for a CustomerVisits table, which contains rows detailing all the products a customer has gone to in their web browser, as well as the time they went. The stream declaration allows for the stream to be instatiated, and tells the IO library, Sfio, to map it's input to our new type. For simplicity, each product belongs to only a single category.

typedef struct {
    ip_t address;
    prod_id product;
    cat_id category;
    time_t viewed;
} visitRec_t;
stream visitDetail_s { getValidVisitRecord : Sfio_t => visitRec_t; };

The first main difference from C to Hancock is that a Hancock program begins with sig_main instead of main. In our example, the entry point is

void sig_main ( visitDetail_s visitsStream v <v:> )

I'm following the AT&T convention here, so the _s at the end of the type, indicates that the type is for a string. It's kind of a reverse Hungarian Notation. What's most interesting, is that the stream type, internally, is kind of like a linked list, and data will be iterated over in a similar manner. The coolest part is that the code for opening the file which stores the data stream is handled for you. That bit is shorthand to tell the program that the -v command line option will have the filename to connect the stream to as an argument. We'll get more into streams later, but the fact that all that command line processing is handled for you at the language level is huge.

The meat of the program will be the iterate block, which contains the instructions for how to process through the data. In some ways, this reminds me of LINQ from C# 3.0, but the techniques are different enough that that really ins't a totally fair comparison. The syntax is pretty clear. The following snippet of code is supposed to do something every time a customer visited a product in the BABY_CLOTHES category.

iterate (
    over visitsStream
    filteredby(v) (v->category == BABY_CLOTHES) ) {

    event (visitRec_t *v) {
        // Do something.  Any C function would be valid here.
    }
};

An iterate block can have multiple events of interest. In this case, there is a single, anonymous event, because the way the iterate statement is filtered (ie, by category == BABY_CLOTHES), the only records available to trigger an event are the one's we're interested in. In C's Switch-Case construct, the anonymous event is similar to the default clause. Other than that, the code is pretty straight-forward. over denotes the name of the stream, and filteredby gives instructions on how to limit the search. In C#'s LINQ, this would look something like this:

data = from visit in visitsStream
             where visit.category == BABY_CLOTHES
             select *

foreach datum in data {
    // Do Soemthing.
}

Very similar. Since Hancock has been around since 2000, and C# 3.0 is just on the verge of release, I wonder if LINQ was at least inspired by Hancock. Both serve similar goals, of making it easier to process relational data. If anything, the similarities between the technologies serve as a indicator of how potentially powerful and useful this sort of syntax is. In some ways, I think the LINQ syntax might be easier to follow. Partially because it's written closer to SQL (which is more familiar). Partially because it returns an IEnumberable type which you can do basically whatever with. With Hancock's iterate statement, once the data has been processed.

For the most part, such simple handling wouldn't be very useful on it's own. But what streams are to Input, maps are to Output. Maps are basically hash tables which allow you to define certain parameters. All maps would define the range of the keys, the type of the value, and a default value. Really interesting, the the split directive, which seems to allow you to tell the computer how large of blocks to load at one time. This is interesting in that it provides a lot of power in letting the developer decide how much system memory to devote to a process in order to find a level that minimizes disk accesses. The intricacies of how that works will have to wait until later, but the basics are pretty simple. If we're trying to determine how many visits a particular product has recieved, our map might look like this:

map productViewCnt_m {
    key (MINPRODUCTNUMBER .. MAXPRODUCTNUMBER);
    split (10000, 1000);
    value int;
    default 0;
}

The iterate function merely has a few new directives:

int view_cnt;

iterate
    (   over visitsStream
        sortedby product
        filteredby(v) (v->category == BABY_CLOTHES)
        withevents productDetect ) {

    event prod_begin(product_t p) {
        view_cnt = 0;
    }

    event product_viewed (visitRec_t *v) {
        ++view_cnt;
    }

    event prod_end(product_t p) {
        pvc<:p:> = view_cnt + pvc<:p:>; // Adds this view_cnt to the existing value.
    }
};

In this case, the productDetect method handles filtering to the event calls. prodbegin is called each time a new product number is encountered, productviewed is called for each view, and prodend is called just before the number changes. pvc is just an instatiated copy of productViewCntm. Incidentally, while withevents takes a function, so does filteredby. In fact, the syntax above for filteredby simply creates an anonymous function to do the filtering. Declaring a function to handle filtering is trivial. It needs to return an integer, takes a single argument which must match the type mapped to by the stream. The values it returns are just like any other boolean comparison in C.

There are a lot of more advanced features to make data access more manageable. Views, Multi-Unions, Directories, Pickles, etc. The Hancock manual is pretty good, and I plan to dig into this technology deeper. Hancock is an interesting domain-specific language, with some features that we're starting to see become more mainstream through LINQ in C# 3.0. While AT&T's use of this technology may be subject to moral questions, this level of advanced data analysis has a lot of potential use. From analying customer usage for billing purposes, to investigating internet traffic for the purposes of noting security trends, and examining customer interests to better serve their needs, Hancock appears well suited to analyzing enormous amounts of data with relatively light hardware requirements. I'll be blogging more about this domain-specific language, and am considering a project to write in Hancock, just to figure out how it really works.

GNOME Support of OOXML

Russell Ossendryver, who sits on the OpenDocument Fellowship, recently heavily criticized GNOME for supporting Microsofts Office Open XML (OOXML) format. While I agree with the basic intent of Russell’s post, that the Open Document Format is already an ISO standard and there is no need for a new standard, I feel that he does a poor job of understanding the events he’s commenting on.

There are only a few instances that I am aware of where GNOME Foundation Members, or Developers have given any indication that OOXML is a worthwhile format. Earlier this year, Miguel de Icaza posted several large posts about why he felt that OOXML was a good standard. A large part of this, was simply that he felt it was a more complete standard than ODF was. Miguel claims that ODF has 4-10 pages devoted to documenting Spreadsheet formulae and functions, while OOXML has 324. This raises an interesting point, as many people have criticized the OOXML standard heavily, due to it’s 6000+ pages, compared to ODFs trim 722. Miguel’s point is that to implement an ODF spreadsheet application based on the ODF documentation, would force a user to examine an already implemented solution. I suppose he would know.

Finally, the gnumeric team has said that it was easier to implement support for OOXML than ODF. When you look at the history of gnumeric, this really isn’t a surprise. gnumeric used to be criticized for being such a perfect Microsoft Excel clone, that they even reproduced several of the more insidious bugs in Excel. Humorous, yes. But not necessarily good. In effect, gnumeric was structured internally in a fashion very similar to Excel, which made supporting OOXML very easy for the gnumeric team.

Really, this all comes down to which is the better standard. An open standard is an open standard, regardless of who drafted it. Admittedly, I am as wary as Microsoft as most long-time Linux users, but any technology they’ve released that has been loosed upon the world as an open standard, doesn’t make me as nervous, which is why I support Mono. Again, I can’t argue with Miguel about which spec is more complete, but there are others who can. According to Rob Weir, OOXML’s forumalae descriptions are deeply flawed, such that all that description doesn’t matter since the answer won’t be correct anyhow. Plus, OOXML is nothing more than an XML-ized version of Microsoft’s old binary formats. Microsoft did nothing to clean up the format to be human-readable and editable. It’s still covered in optimizations that were made 15 years ago, but simply don’t make sense in today’s world. The file format used in Office 2007 isn’t actually the same format submitted to ECMA and ISO. I could go on.

OOXML is a poor standard, not because it comes from Microsoft, but because it doesn’t offer any of the benefits that XML should offer. It’s a binary format wrapped up in XML, where there are so many interdependencies that the only good way to modify a OOXML file programmatically is to use the Office COM objects, since editing the XML directly is almost certainly going to break something. OOXML isn’t even a step forward in Microsoft Office documents, let alone in Open Standards. Plus, ODF is better for business, since it natively supports document-wide options to ensure that calculations are done with correct precision, which is very important in accounting. Microsoft Office has a long history of rounding errors which can cause problems, and there are no signs that they’ve gotten any better with Office 2007.

While I agree with Mr. Ossendryver on the overall point that ODF is simply the better format, his criticism of GNOME for supporting .NET as well is simply foolish. No Open Source project (aside from Mono, obviously) can compare to what .NET is. Even Java is a poor comparison to .NET, as it lacks many of the features that make .NET so interesting. The ability to easily share code from different languages, and share variables between those languages. Plus, in my opinion, C# is simply a superior language to Java, though it’s existence has begun to push Java to new heights I don’t believe it aspired to before. Ultimately, Mono wasn’t about Microsoft Interoperability. It was about making development for GNOME easier, and I think it’s done that. However, interoperability with Microsoft technologies is not a bad thing for Linux as a whole.

It’s not secret that Microsoft has been pushing .NET incredibly hard since it’s inception. With .NET 2.0 and the new .NET 3.5 being released soon, developers for the Windows platform have really embraced the tehcnology. Mono provides a means to show developers that the application they developed in .NET can be made to support Linux with relative ease. Making it relatively simple for developers to target multiplete platforms with a single codebase increases our choices of applications on all platforms. In his interview on FLOSS Weekly, Ryan ‘icculus’ Gordon makes an observation based on his time at Loki Games. Targetting individual games wasn’t going to get Linux anywhere. It was in the best interest of Linux and Linux Gaming to target platforms, such as iD’s engines (which iD has always been good enough to port), and the Unreal Engine.

By making these game Engines available on many platforms, developers can choose to target multiple platforms without having to work too hard at it. While few licensed games for Unreal Engine have been released for Linux to the general public, this has evidently made a large impact on Arcade Game manufacturers, who can cut costs by using Linux and only having to license Unreal. Mono serves a similar purpose. It provides developers with an option to use a different platform, and it’s apparently become fairly popular among ASP.NET deployments.

I still feel that .NET is a good Open Standard, and a solid platform, and I’m pleased with the work that Mono team has done to develop it. While I fully support ODF, and hope it becomes the format of choice, I argue that we should fight against OOXML because it’s a substandard format, not because it was written up by Microsoft.

Cryptographic Keys and the Web of Trust

In my last post on Cryptography, I referred to the Web of Trust concept which is vital to most Public Key Cryptosystems. One point I made was employers were in a unique position to assign trust, as they already have to do verification of identity before they hire their employees. This verification of identity is core to the Web of Trust.

However, what is trust? When dealing with Public Key Cryptography, we have to change our definition slightly from what we normally expect. In the Cryptographic Web of Trust, we aren’t indicating that we trust a person with anything of ours personal. Signing a users key is nothing more than an indication that they are who they say they are. So, using the old Cryptography pals Alice and Bob as a demonstration, allows me to explain how this works. Alice and Bob need to communicate securely, and they both have Public/Private Key Pairs. Somehow, they will have given their public keys to one another (this will be discussed later), once Alice has verified that Bob’s key is really Bob’s key, and Bob’s done the same for Alice, they can sign the other’s public key with their personal private key, to demonstrate that they have faith that the key is valid and belongs to the person they say it belongs to.

Sounds simple enough, right? And really, this is the goal of the key-signing parties I mentioned last time. However, what if you need to send an encrypted message to someone who you’ve never met before? Let’s take Carl, a friend of Bob. Carl needs to send a secure message to Alice for some reason, but he’s never met Alice, and it’s important enough that he doesn’t have time to verify Alice’s key himself. But he’s signed Bob’s key before, and Carl knows Bob well enough that he feels Bob will be a responsible key-signer, so Carl feels that Bob’s signature on a key is good enough to make him feel comfortable trusting any key Bob has signed. This take us to the next stage of the Web of Trust, assigning trust values to signatures. With this, Carl can tell his key management software that he has full faith in Bob’s ability (and tendency) to properly verify an identity before signing a key, and thus any key signed by Bob is just as good as if Carl himself had signed it.

This idea of key validation is the greatest challenge to the growth of a healty Public Key Crytosystem. Most people are really bad at properly verifying crypto information before they sign it, and not verifying trust on a key before they use it. Social Networking sites have shown just how problematic this would likely be. People will mark one another as ‘friends’ on the slightest pretext. While this tendency may not extend to signing of Public Keys, it is still a troubling trend. If people sign keys too imprudently, the value of that persons signature goes down considerably. Let’s say that Alice is a person who will sign keys because she’s recieved e-mail from a person claiming to be Bob. Now, it might really be Bob, it might not, but Alice signs the key after sending a few e-mails back and forth without really verifying that Bob was the real owner of the key. If Eve was claiming to be Bob, Carl might get an e-mail from Eve, see Alice’s signature, and think that the Eve might actually be Bob based solely on the merit of Alice’s signature.

With the challenges and education necessary to build a strong Public Key Cryptosystem, it isn’t surprising that the technology hasn’t really caught on. A few years ago, this was attributed to the fact that using the encryption software was too difficult. I don’t believe this is true any longer, as most systems which support PGP or GPG (the most popular Public Key Cryptosystems out there, which are compatible with each other) will integrate into software that requires it (e-mail and file management mostly) in a fairly trivial manner, certainly easily enough that most users would easily be able to adapt. The problem, is in teaching users the value of signatures, and what it means in the grand scheme of the Web of Trust to sign a key. This is the great challenge, and as with most things, the challenge is not really technical one.

Thawte has taken an interesting approach to this problem. With the Thawte Web of Trust, they have built a system which uses S/MIME to sign and encrypt e-mails. Thawte’s Web is built around the idea of Notaries and Trust Points. In order for a user to even be considered ‘Trusted’ in the Web, which allows them to assign a name to their e-mail certificate, they must acquire 50 Trust Points. This is done by meeting a Notary, in person, and getting them verify your identity. As a Notary can only give out a maximum of 35 Points, you need to meet with several notaries before you can be considered ‘trusted’. In order to become a Notary, you must have recieved at least 100 Trust Points, and even then are limited in the number of trust points you can assign. This trust model is interesting, in that it takes the context of trust out of hands of individual users. You don’t know who has given the key trust points, only that it has recieved the points.

Thawte’s web, and S/MIME has some drawbacks that I view as pretty serious. They only work with e-mail. A PGP or GPG key can be used to sign/encrypt e-mails, or any individual file. The certificates are assigned to e-mail addresses, not people, so they aren’t really applicable for Cryptographic Signatures on contracts. Again, with GPG or PGP keys with a high-degree of trust, a digital signature against the file containing the contract serves as a reasonable proof of identity of the signer, and can be used to verify the version of the file which was signed. Encryption tends to have other problems in general. E-mail clients that don’t support encryption will behave strangely when presented with encrypted or signed e-mails. Web-based e-mail clients don’t support encryption, as there are too many questions on how it should be supported. Malware can be hidden from scanners if wrapped up in an encrypted file.

In my opinion, the webmail issue is the greatest one. When I was the sysadmin at CB Apparel, I made an effort to move the company to IMAP from POP3, including support for web-mail. We would use transport layer encryption to protect the data during transit, but we had no public-key encryption infrastructure (nor no immediate need for it). Unfortunately, there was no existing method to implement encryption into the web mail. For that to work, the Private Key would need to be stored on the server. This actually isn’t the end of the world, since any private key should be protected by a symmetric key with a good, and more importantly long, passphrase. But then the Passphrase needs to be sent across the wire to the webmail server, and the user is left to trust that their passphrase is being disposed of in a secure fashion. A very, very difficult problem, one that is still being worked through. Ideally, the encryption of the message would be done on the user’s local machine, but that raises it’s own issues when dealing with Web-based E-mail. Javascript is notoriously slow, and there is no way to load the encryption key from the hard disk (thankfully). As for the other issues, non-supporting e-mail clients are getting increasingly rare, and there is never a good reason why a user shouldn’t have some form of local malware detection. Yes, even you Linux and Mac users. Install it now, before you get infected by a virus rushing to be the first to infect your platform.

Overall, I believe the main reason these technologies haven’t been taught to the majority of users is that there isn’t a percieved need in our society for high-grade encryption for the masses. I once had a professor who felt that citizen should be issued encryption keys, which would be verified and signed by the government. An interesting ideal, and it could lead to the large scale adoption of encryption technology, if the government hadn’t attempted to force the integration of technologies that would limit the effectiveness of cryptopgraphy. I fully believe the need for high-grade encryption is out there, and is more and more important in our increasingly digital world. Until that need is percieved by masses, it won’t happen. Next time, I’ll be posting about the Key Distribution problem. The issues we’ve encountered with it, and how we’re working to solve them.

How Much is Your Privacy Worth?

In my recent post regarding Apple and DRM I made some comments about how little value most young people seem to place on Privacy these days. Most people are posting incredible amounts of personal information all over the Internet. I try to be pretty careful to not put anything more specific than the town I’m living in (okay, in College, I would mention the Residence Hall I occupied), certainly in even remotely public discussions. It doesn’t matter if I know 98% of the people in an IRC chat room, I don’t necessarily trust them all, and there still might be some crazies in that 2%.

One of my coworkers, was telling me that his three year old had figured out how to turn on a Macintosh computer, find, and start a paint suite she liked to use. How much longer before she’s introduced to the Internet? I certainly understand the impact technology has on young people. My family had a computer in the house since I was about five years of age. I knew far more about the device than either of my parents did, even at that young age. The digital divide isn’t going to be as significant for this upcoming generation as it was for people my age and our parents, simply because people like me, who grew up with Computers, are starting to have children. We have lived the computer revolution, and there hasn’t been a paradigm shift significant enough to throw the greatest benifit to the youngest people just yet.

Still, with the growth of the Internet, mobile phones, and ubiquitous computing, people seem to have lost all sense of proper trust. The line between public and private has blurred as we all become our own personal papparazzi, airing our own dirty laundry without any regard for consequences. More than once, otherwise qualified candidates have lost their opportunities because of things they’ve posted on the Internet. In blurring the lines between Public and Private, many people have destroyed their ability to hide their private life from their professional life, causing everything they do to be scrutinized by potential clients and business associates. Is this right? Probably not. But it’s the way it is, and is going to remain.

You can’t blame the Internet for this. After all, the Internet is nothing more than an enabling force. A system which facilitates communication and sharing. Ultimately it is the responsibility of the User to ensure that they share only that which they are comfortable sharing. However, other technologies threaten to reduce or eliminate a Users ability to choose. Cities like London are literally covered in cameras, ostensibly meant to solve crimes and detect threats. While the success of these programs may be called into question, the implications for Privacy are hard to ignore. One might argue that the feeds are only accessible to Law Enforcement. This is true, to a degree. But any time a feed is made available, it can, under some circumstances, be made available to unintended parties.

As an example, I was listening to the Lex & Terry show on my way to work a few days ago. They had a guy call in to talk about a situation he found himself in where a neighbor had, unintentionally (I hope), made a wireless camera feed of his bedroom available to his neighbors. The caller had discovered this when his daughter had finished playing with her Barbie Web Cam, and that when the camera was turned off they got a different video feed of a bedroom. Once they figured out who it was, they would call the guy while he was in his bedroom, more than once apparently witnessed him masturbating via this camera. I am operating under the assumption that this neighbor had no intention of broadcasting his bedroom 24 hours a day, 7 days a week. I believe he had a very specific desire when he set up this camera, and I don’t think it’s worth posting any of that speculation here. Still, because he didn’t configure the device correctly, he inadvertantly allowed himself to be spied on by his neighbors.

Misconfiguration can cause all sorts of data to be accidentally released to the public. I once worked with a company that inadvertantly published their customer’s private data (including address and credit card number) on the web, and Google happened to find and index it. I removed the offending data, and had to work hard to get Google to remove their cached copy quickly. This was a company with several million dollars a year in sales. If individuals and companies can accidentally release private data, are we to believe the government might not make a similar error? By allowing the government to film us whenever we’re in public, we’re trusting them implicitly to ensure that no one else can get their hands on that feed, and that they aren’t harvesting enormous amounts of data about our habits. Where we go, and when. Who we spend time with. What brands we purchase. What shops we frequent. Enormous amounts of data, which could easily be used to create a frighteningly complete profile of our lives. And if that information were to make it into the wrong hands…

The Camera issue may seem a little bit paranoid, and the above may have been slanted toward the paranoid. There was a reason for that. This is a situation where the availability of this information is not being controlled by the user. Where Data Mining (and you know the British government is using the London Cameras to mine data about their residents) is being done with information you can’t control, save by never leaving the house.

Can we avoid data mining? Not without some inconvenience. Everytime I buy anything using a credit card, or even a check, that information is logged by my bank, and often the organization I’m buying from. Anytime I visit an online store, they’re examing what I’m purchasing (and likely what I’m viewing) in an attempt to better offer me things that I wasn’t explicitly searching for, but might be convinced to buy. There is big, big business in mining information about you, usually for the purposes of advertising. Almost always this information is collected in such a way that you have no idea this kind of information is being tracked. I’m not against advertising, but I prefer my advertising to be in the widely-targetted-at-venues-percieved-demographic style like Television, rather than a narrow-we-know-who-you-are-and-what-you-like style like the Data Miners are trying to make possible. Google Ads, which are used on this very site, appear to focus their advertising based on the content of my pages, rather than any information that Google has on you personally. I’m not 100% certain this is the case, and Google is certainly in a unique position (Adsense, GMail, Search, Analytics, etc) to be a data-mining giant, and they do mine utilize their position somewhat. I have no reason to believe that Google is any less moral than any other Advertiser out there, and I think they might be somewhat more so. Until I’m proved otherwise, I’ll continue to favor Google over other searches.

Of course, I have ignored one last source that companies use extensively these days to gather data about customers: Rewards Cards. Almost every major chain store offers some form of rewards program where by spending money at their store, you get something back. If they offer such a program, you can be guaranteed that any attempts to shop at that store will be met with an attempt to get you to sign up for the card. Part of the reason for the card, is to convince you to shop at their store above other choices. I believe most as people who sign up for these programs are likely to sign up for these programs at a lot of places, rather than be faithful to a single store because of them. Actually, this is a place where Barnes & Noble is ingenious. By charging $25 per year for memebership to their rewards program, people are far, far less likely to shop at competitors. I know it’s impacted my own purchasing decisions in the past.

By using these Rewards cards, everything you purchase is being tied to an account. And that account can be tied to you. Since most of these accounts can be referenced via phone number, I’ve given some thought to signing up for these accounts and sharing the fake number associated with it with many people in order to invalidate the farmed data (of course, you’d need a lot of people, and geographic proximity to a decent number of them for this to work). I know this isn’t a unique idea, and I suspect many people have done it. To combat this, some stores are fighting back. Safeway Stores now has a program where you get $0.10 off each gallon of gas you buy (redeemable for only one pumping session) for each $100.00 in groceries you spend. In a situation where you’re trying to fight against the data mining efforts, you lose your ability to really take advantage of a program like this. Staples’ RewardZone program, sends a voucher for store credit based on the amount of money paid in store for things. By offering tangible rewards which take effect by later action (instead of relying solely on immediate rebates), users are less likely to share their credentials for reward programs.

It turns out, that I’m willing to sell my privacy (for my food purchaes, at least) for 20%+ off my food bill. What’s your price?

Intergrating High-Grade Encryption into an Enterprise

Security of Data can not be well maintained without the use of high-grade encryption.  Even relatively low-grade encryption, such as the 128-bit SSL keys popular in Web communications, can be broken given enough patience.  Of course, data security needs must be managed carefully against performance considerations.  It doesn't matter if you're data is absolutely unbreakable, if using it is so inconvenient that no one will bother.  Of course, computer power has increased at such a rate lately, that encryption times are no longer as much of an impediment as they used to be.  Ten years ago, who would have dreamed of using an encrypted filesystem?  These days, it would be foolish to consider having any private data on a laptop that wasn't encrypted.  Which leads, easily, into the First Law of Enterprise Encryption:

All portable computing devices must be protected with a proven encrypted filesystem.

This ensures that even if a piece of hardware is stolen, the data on it will be safe from prying eyes.  Of course, anytime you keep encrypted data, some sort of security is needed to protect the key used to decrypt the data.  In most systems, this will be implemented as a passphrase.  Brute-forcing has come a long way in taking advantage of the increased parallelism of modern sytems, but don't be fooled. Unless your password is very short, and very simple, even the most advanced Brute Force method will still take years to complete.  That isn't to say that your current password is good enough, or that you shouldn't work hard to protect it.  I've seen plenty of good heuristics which can greatly improve the performance of a Brute Force attack by taking advantage of common patterns.
 
Interestingly, passphrases have been a deterrent to many organizations to adopting high-grade encryption.  The reason is simple, and I'm sympathetic to it.  If you lose your passphrase, or your key, any data encrypted with that key is lost forever.  Losing keys should almost never happen.  Storage is cheap.  Backups are easy, and plentiful. Even printed out, a 4 Kilobyte RSA key (generally considered enormous), will take only a few sheets of paper, and will be more resilient than a digital copy. Backups should always be stored in a secure location (a safe would be ideal, or a bank vault).  If you must keep a backup of your passphrase, it should be stored in a seperate physical location from the key.

Backup copies of keys and/or passphrases should be kept in highly secured facilities, which very few people have access to, and they should never be kept with each other.

Now that the data and passphrases are secured for portable devices, we arrive at a question of securing communcations.  SSL provides good coverage for most purposes, plus key sizes have grown quite a bit recently, providing far better security. However, SSL has a few downsides.  First, it's expensive to get a key that is signed.  Part of this is the Certificate Authority (CA) needing to go through the steps of verifying your identity and worthiness of a signed key, and it's worked out fairly well for the web, where a list of trusted CAs may be maintained by the browser vendor.  Of course, this implicit trust model could be exploited, though most organizations have enough interest in keeping their reputaion in order to prevent gross abuses.  The key problem lies with the fact that it is nigh-impossible for a user to add a new CA key to their systems.  If you wanted to trust that I was a worthwhile signer of keys, and you could trust any key I put my digital signature on, it isn't straightfoward to add my key as a trusted signer.  Further, people looking to get keys signed would be unlikely to choose to have me be the signer on their keys, since if my key isn't listed as a trusted key on a user's browser, the user is presented with errors.  In effect, we've pushed the control of the web of trust into the hands of corporations like Verisign, who charge $300-$1200 per year for a key.  Even if Thawte offers e-mail keys for free, they don't allow the use of the key for business use, making it basically useless in our target today: the Enterprise.
 
The current state of crypto-management, with a small number of dedicated CAs and high cost of entry, is a symptom.  It's a symptom of the lack of good integration of cryptographic technology and thinking into our current environment.  Some people say it's because we're not evolved correctly to handle modern security and trust issues as they relate to the Internet.  I think that's unfair.  Modern information security may not be instictual (just yet), but we can still learn to use it, and as the technology becomes more tightly integrated into our computing systems, we won't have to worry about it.  However, I think the way a healthy crypto-ecosystem is supposed to work is perfectly understandable to most users.  It's built around the Web of Trust concept. Most people already live their lives according to a web of trust theory.  
 
You tend to trust your close family and friends, and therefore, will extend a bit more trust to your those people who demonstratably know someone you trust.  For instance, if my father indicates that he trusts someone whom he works with, I'm more likely to trust that person because my father does.  This is a very basic level of how our society is orgnanized.  Yes, it can be taken advantage of (con-artists take advantage of this aspect of trust all the time), but public-key cryptosystems require a more explicit expression of trust than normal society: the key signing.  Some people even have get togethers where they arrange to sign one anothers keys, thus expanding the web of trust.  Key Signing Parties are an interesting method to solving the problem, but ideally, all people who use a digital system would have their own private key, and the web of trust could be grown more biologically than it is today.  
 
Large organizations are in an interesting position to solve the web-of-trust issue. Large companies already perform verification of identity when they employ a new employee, and in doing so, could easily sign an encryption key for the user, having verified that they are who they say they are, and the key does belong to them.  For an organization which consists of many departments, a 'keymaster' could be assigned, who's responsibility it is to sign verified keys.  As long as the trust model ensures that users trust the 'keymaster' for a department, all keys signed by that 'keymaster' will have at least some level of trust associated with them.  This allows for secure communication between all users on a cryptosystem, and allows for the transmission of even the most priveledged data, since it will travel through the network in an encrypted state, safe from prying eyes.
 
Unfortunately, there are problems with all well-meaning cryptosytems.  Key distribution, key revocation, key expiration.  A lot of questions and arguments on how to handle these issues.  I'm not going to touch on them today, but upcoming posts will deal with these problems, and possible solutions.

Vista Causing More Windows Mindshare Slippage?

Rupert Goodwins, a ZDNet writer, has posted his impressions of Ubuntu compared to Vista from the perspective of a long-time Windows User. Interestingly enough, he feels that Vista changing things for the sake of change is the biggest problem with Vista. Is this another example of the Feature Creep that Jeff Atwood has been bemoaning lately?

Personally, I haven't had much of a problem with Vista. But then, I'm a long time Unix user, and even though I've worked as a Windows System Administrator, I don't consider myself to be a Windows Guru. Not by a long shot. I brought far more general systems knowledge to the table than Windows Specific stuff, and for the size and age of the network I was managing, that was fine.

Since I wasn't a Windows XP genius, the change to Vista didn't really bother me. The eye candy aspects are pretty stupid, and don't even look particularly good, especially compared to what's in Compiz. But you shouldn't get an OS because it's pretty. And there were some good changes. Roaming Profiles in XP work a lot better. They still aren't what I would consider perfect, but they are definitely much improved. Of course, it still isn't as good as Unix, where literally everything can be easily put on your profile which follows you around the office.

GTK+ and Object Oriented Programming in C - Not worthwhile?

I was reading the Monologue this morning, and a blog post from Laurent Debacker came across the wire. He was writing about the Ribbon Sharp library he’s been working on, and his own feelings about the GNOME project from a GNOME outsider. Now, personally, I don’t much care about the Ribbon support he’s been working on, as that is probably my most single hated change in Microsoft Office 2007. That may just be because of how severe a paradigm shift ribbons are compared to the classic menu view.

However, his comments about GNOME were interesting to me. Laurent admits that he is pretty new to the GNOME community, and he appears to be fairly young (though probably close in age to myself), which may well explain a lot about his views on software development. The specific comment I’m referring to this

I think that the main problem with GNOME for a new developer, is that programming in OO in C using Gtk+, is a bit like programming in COBOL.

I can definitely see where he’s coming from this comment. However, I view this primarily as a problem with the education provided by most Programming courses. Object Orientation is not a technology. It’s a paradigm.

Certainly, it is much easier to perform Object Oriented Programming in a language which was built around the paradigm (C#, Java, C++). Once the Paradigm in understood, migrating it to a traditionally procedural language such as C is trivial, and will bring about the same design benifits. If anything, being written in C is a huge benifit for GTK+/GNOME, as it’s C roots have facilitated the porting of the libraries to a vast number of platforms, and bindings have been written in a large number of languages. (To be fair, Laurent does mention the success of the bindings).

However, the largest reason GTK+, and by extension GNOME, are written in C is history. GTK+ was designed as a replacement for Motif, when the GIMP project decided they needed to migrate away from Motif to be a truly open piece of software. Motif was a C library, GIMP was a C program, C was the only obvious choice. Then, ten years ago Miguel de Icaza decides to start the GNOME project, in response to KDE using QT and QT being non-free software. At the time, Miguel didn’t view GTK+ as ideal, but it was far better than programming straight X11, and it was available. (For more on the founding of GNOME, check out Chris Dibona’s interview of Miguel on FLOSS Weekly).

So, C was the language of choice, and by now, so much of the backend is written in C that I don’t think that’s going to change. The only potential ‘disadvantage’ I can see of GNOME remaining in C, is that in order to add a widget to GNOME such that it will be available to all bindings, is to write it in C. I don’t really see this as a problem as I don’t think C is going anywhere anytime soon, and GNOME performs very well having been written in C. Plus, the alternative proposed by Laurent carries with it several distinct disadvantages.

  1. If GTK+ 3.0 were to be ported completely to .NET, it would only run on platforms where Mono was supported. Currently, that list is comparatively short.
  2. C code would still be required to bind to the underlying system APIs (X11 on Unix, Win32 on Windows).
  3. If the language isn’t .NET compatible, it can’t be bound.
  4. It would require reimplementing an enormous amount of code, for minimal return.
  5. Some people would drop GNOME altogether if this were to be done. A lot of people don’t trust Microsoft or .NET, and by extension the Mono project.

I think .NET and Mono are great technologies, and I use .NET when I can. I have extended just enough trust to Microsoft to be willing to follow and work with Mono (hopefully on, as well), but that doesn’t mean that a lot of people are willing to do the same thing. Until Microsoft can prove that they aren’t a wolf in sheep’s clothing (which many Free Software people will probably never believe), basing a platform like GNOME on a technology so encumbered by Microsoft patents (regardless of ECMA standardization and Novell agreements), is dangerous and would be seen as highly alienating to a good number of users. It’s a risk not worth taking for GNOME at this time.

In short, I see far more disadvantages to attempting to convert GTK+ away from C, than advantages. I like .NET, I love Mono, but neither they, nor any other language provide any strong incentive to porting GNOME.

With Apologies to Steve Jobs

I’ve been pretty harsh to Steve Jobs lately, regarding the lack of an SDK for the iPhone. Jobs is well known for being something of a control freak. The iPod and iTunes have both always been pretty heavily locked down. Hell, if someone else would put out an Open-Firmware, Upgradeable MP3 player, I’d look at replacing my old Neuros Audio Computer. Yeah, I know there is Rockbox, but the existence of Rockbox does not make buying an iPod anymore appealing to me. It’s still a closed platform, as far as Apple is concerned.

But then, Apple has become a lot more open to developers in recent years. Mac OS X ships with XCode, and all the tools needed to write software for the Macintosh. Microsoft only offers their SDKs in a separate large download, and even then only provides a neutered development environment for folks who haven’t paid for it. Still, the landscape is a lot better than it used to be. Still, though Apple and Jobs encouraged development for the Mac, they seemed to be heavily against it for their embedded devices, seeming to be more interested in heavily controlling the user experience than giving users the freedom to do what they want with their toys.

The DRM embedded in MP4 and iTunes downloads, both video and audio, which is heavily integrated into the iPod and iPhone is just one sign that Apple doesn’t want you to own the Media they’re selling you. Okay, so maybe this was due to pressure from the Recording Industry, and Apple has made strides in adding DRM-Free music (which is now even the same price as the DRM-laden stuff), but by embracing technology which restricts the user’s freedom, Apple showed that they were more interested in exercising control, than promoting freedom.

Remember, this is the company that once encouraged us to “Think Different”. Now, the company has placed itself as the embodiment of what is chic and cool in Geek today, and in so doing did they give up that attitude of revolution? Steve Wozniak seems to think so.

Still, Steve Jobs announced today that Apple plans to release an iPhone SDK in February, allowing developers to write iPhone Apps. I had said many, many times since the iPhone’s release that Jobs was never going to do it, that he didn’t want anyone else to play in his platform. Vehemently fighting off the Apple Apologists who were denouncing AT&T as the culprit here. I was wrong. Jobs is releasing and iPhone, and I apologize to him for saying otherwise.

But is it enough? The new SDK will still require that all applications be digitally signed before they can run on the iPhone. How easy will it be for a hobbyist developer to get a key to sign their own applications? Or will it cost hundreds of dollars to get the right to have your application ‘certified’ by Apple? Apple says that they’re worried about the possibility of malware, viruses, spyware, etc. on the iPhone and protecting user rights. I’ll agree with Jobs, there have been some viruses which self-propagated for Cell Phones. Yes, they were all Trojan Horses, since no Phone I know of will auto-install anything, but social engineering is a well understood threat.

I suppose the crimes come down to that of the User. If the User is comfortable being locked into a sandbox, where all their toys are approved by the mysterious shadowy overlords on the other side of the fence before being supplied to them, then the state of things is the fault of the User for accepting this as the status quo.

In 1759, Benjamin Franklin published “An Historical Review of the Constitution and Government of Pennsylvania”, by an author unknown, which held the motto “Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety.” I’ve no doubt you’ve heard this quote, and while the context of the quote may not be quite the same, I feel the message still applies. If you don’t have full access to it, you don’t own it. For me, it is not worth trading in my ability to tinker in order to gain whatever amount of security the vendor chooses to supply me with. Even today, no Cell Phone Virus has been able to propagate without user influence. Users need to be educated, not hamstrung, to prevent the spread of malware.

Even today, deception is the primary method of spreading malware, and it’s proven to be highly effective. Users apply the same level of trust to a contacts e-mail address, phone number, and SMS Address that they would to the person themselves. Users share enormous amounts of personal data online (browse Facebook and MySpace for a sickening number of examples), without any concern for whom might be looking. If a complete stranger were to approach you on the street and give you a package, you’d be suspicious (I hope), even if they seemed otherwise respectable. The same suspicion needs to apply to all digital communications. Unless you’re using high-grade encryption, using verified keys (and even then, you should be mindful of the consequences of what you’re being asked), you can’t verify that the person you’re talking to is who they say they are. On the Internet, all people are strangers, and should be treated as such. Like with anything in life, we need to be mindful of the risks, but ultimately the decision to take that risk should be our own.

MSDN Tech Session: VS 2008 and Silverlight

Today I journeyed up to Spokane with some guys from work to attend a
MSDN Tech Session about some of the new stuff in Visual Studio 2008,
.NET 3.5 and Silverlight. The talk was presented by Mithun Dhar; a Developer Evangelist for Microsoft, and Mark Michaelis, a Systems Architect from Spokane who seems to do a fair amount of this kind of thing for Microsoft.

I
was initially interested in attending for the talk on Silverlight. 
VS2008 doesn't run on Linux, and Mono doesn't quite have .NET 2.0
support done, so 3.5 seems like a little ways away.  While I wasn't
impressed enough to change my heathen ways, Microsoft has done some
very important things in VS2008, and some of the new features in .NET
3.5 really seem interesting.  Silverlight (or should I say Moonlight?) does stand to be one of the more revolutionary web technologies to be discussed in recent times.

But
let's start where todays talk did.  Visual Studio 2008.  As I said, I
wasn't overly interested in this talk, since I'm not going to use it at
home, and I'm not sure we'll be using it at work any time soon.  There
doesn't seem to be too much that's new in this version.  Improved
Intellisense, Intellisense for Javascript, a split-mode for web design
(I'm pretty sure Dreamweaver has done this for several versions), a slew of new modes, and of course support for all the fancy new .NET 3.5 features. 

However,
none of these were really enough to make me interested.  It seems
Microsoft has finally decided that a good Javascript debugger is worth
having, and integrating it with a web browser is important.  So, Visual
Studio can now debug Javascript as if it were any other program.  It's
pretty cool.  Plus, Mithun said that it works with Firefox and Opera in
addition to Internet Explorer.  I'm not sure I believe that, but if
it's true, that's really impressive for Microsoft.  This is definitely
a place where Microsoft may have a win.  Venkman, the Firefox Javascript Debugger, is really not very user friendly, and neither it, nor Firebug

integrate well with any development environment I've ever seen.  I use
Venkman and Firebug regularly, but the VS2008 Javascript debugger looks
much, much easier to use. Of course, it won't work on Linux, so I'm
going to be sticking with Venkman and Firebug, but I do hope someone
can implement something like that feature in MonoDevelop or something
else.

.NET 3.5 has some pretty interesting features slated for
it's release, or shortly afterward.  First up, is Extension Methods. 
Extension Methods allows a developer to tell the compiler to
functionally add a method to a class, without actually modifying the
class.  An example given was adding a Capitalize() method to the String
class, which would convert all the letters in a string to upper case. 
This is a technology that I can see being hugely useful, and the Syntax
is pretty clear, for instance, for the previous example:

static class MyExtensions {
    public static string Capitalize(this string input)
    {
       // Modify Input in here to convert it all to uppercase.
    }
}
The
key to this is the new "this <object>" where <object> is
the object or interface, you're adding the Extension Method to, in this
case, string.  Scott Guthrie has a lot more to say about this subject,
and this blog post

is probably much better reference than anything I could write right
now.  In .NET 3.5 Microsoft uses Extension Methods to extend classes if
certain features are enabled.  For instance, if you're using LINQ,
which I'll discuss next, several Extension Methods, such as Where, are
added to IEnumerable to extend it's functionality, without redeclaring
IEnumerable. 

In some cases this makes a lot of sense.  If you
only need to add a method or two, and the changes have a limited
usefulness, they appear to be an invaluable tool.  I even suspect there
is almost no overhead, since the linking of the extension method could
simply occur at compile time.  However, I fear there will be times when
people will simply write a handful of extension methods, when it really
makes more sense to extend a class.  I'm afraid this may lead to more
confusion down the road for people maintaining software, as some people
will have needlessly and recklessly extended classes.  However, this
may be more of an issue of poor coding standards than ill-conceived
features.

The next new language feature we discussed were .NET Language INtegrated Queries (LINQ).  The reason LINQ is so amazing, is because it always a user to execute SQL-like statements against any

collection.  Often times, it will be used when querying a database
object, but the really cool thing I saw was that it would allow you to
select a subset of any IEnumerable object based on selection criteria. 
As a trivial example, consider the following:

ArrayList data; // Initialized with random numbers

Foreach var number in (from data
                                      where value % 2 == 0
                                      select value)
{
    // Do something with even values in data
}
As
an example, this is pretty lame, I'll admit, but it demonstrates the
basic idea.  You actually put in SQL-like queries to get only the data
you really want from a collection.  Actually, the mashed-up SQL I
didn't much care for, since it's confusing to all us SQL developers,
but it's done to provide cues to Intellisense, so it can help prevent
programming errors.  Plus, you can use Lambda Expressions in the where
clause, to provide even more power in the selection. 

Another
new feature I use above is the 'var' syntax.  No, C# does not have
VBScript style Variants.  It is still a strongly-typed language, but if
you don't know the type of a returned variable off the top of your
head, you can just use 'var', and the compiler takes care of it for
you.  I feel kind of strange about this feature, since it seems so
pointless.  At least in the Boo programming language, Duck Typing

is actually a variable type.  The use of the var keyword in this case
seems to encourage laziness.  One of the guys I traveled with did point
out one decent use, and that was when you were instantiating a class,
and the type was already on the right hand side of the statement.

 The
last new Language Feature ties in with LINQ, but it's kind of an
extension.  Microsoft calls it LINQ for Entities (codename: Astoria). 
Basically, it's the LINQ interface I was talking about, except instead
of calling the Database directly, or another IEnumerable object, you
create an Entity-map of your data.  This is used by .NET to
automatically turn your relational-database rows into custom Objects,
where you can access and change your data through properties
(presumably this can be configured somewhat in the Entity definition). 
Plus, you build the relationships between objects.  For instance, in an
e-commerce application, you would have a list of Customers.  Each
Customer can have a list of Orders, and each Order would have a list of
Products on that Order.  All of this is managed in Objects, and
collections, allowing you, as the developer to abstract the data in a
way that is more logical for the programming language you're using. 

Pretty cool.  Still, I'd rather just have an Object-Relational
Database, but this is still a good step.  Unfortunately, it abstracts
the loading into many, many calls into the database, so there are some
potential performance issues a developer will need to keep in mind.

And
then there was Silverlight.  Silverlight is Microsoft's new Flash
competitor.  Unfortunately, Silverlight 1.0 is kind of lame.  That's
not completely fair, since XAML can be used to create some fairly
complex visual effects, and some Javascript-based interface tweaks that
can look pretty nice, but for the most part, it's pretty basic
effects.  Silverlight 1.1, which isn't due out until next year
sometime, adds the ability to embed .NET code into the package.  This
is going to be huge.  Microsoft has really been pushing the video
aspect of SIlverlight, and to be honest, it's nice.  It does 720p
natively, and does a good job of scaling it, if it isn't available.  It
has good support for SVG overlays, so you can easily add watermarks,
subtitles, etc. to playing video.  Plus, it just runs well.  One of the
demos was playing a video with a jigsaw pattern cut into it, and then
he could scramble the puzzle, so it could be reassembled.  The pieces
kept playing their part of the video.  It was just amazing.  I'm going
to have to look into helping Mono get Moonlight up and running.  It's
just going to be too cool.

Jaunt down to Lewiston

Catherine wanted to get down to Lewiston yesterday, to stop by the JC Penny’s because they were having a huge sale.  Some of it was needing clothing, some of it was needing Drapes for our house to help deal with heat loss around our two huge slider doors.  Unfortunately, we weren’t able to find any drapes we liked that weren’t going to cost an arm an a leg.

The trip down was decent.  We went to Moscow for the Farmer’s Market, where we’d picked up a bunch of vegetables and a gallon of honey to last us through winter, so we just took 95 due south.  The drive was nice, and there are a few spots where you get a fantastic view of the Palouse from a few hundred feet up, I hadn’t realized you could get that far off the Palouse without leaving terra firma. Of course, the road down into Lewiston is about a 7% grade, which was pretty intense.  The Mazda really hated that on our way out of town.  In a sense, it made me glad I didn’t get a job down there, because the gas would have been a bitch.

Lewiston as a town reminded me a lot of Billings.  I suppose it should, since it’s the largest town between Spokane and Boise, the Tri-Cities and Missoula.  Pullman and Moscow are nice, but they’re fairly small towns, and I’m not sure they’ll even be very big towns outside of their respective Universities.  Which is fine.  I actually rather like Moscow. 

However, the steep hill into town is only the beginning of crazy things in the town.  The entire town is really, really hilly.  The Taco Time we stopped at for Lunch had an incredibly steep entry, which made it tricky to get out, as the road was reasonably busy.  The highway into town has an intersection where two east-west roads come together with the main north-south drag.  Finding yourself in the wrong lane is easy enough, but even if you’re in the correct lane, it can be terrifying enough. 

But then, we came across my favorite part of the entire town.  The Safeway is perched high on this hill, the sides of which have been carved flat and had these enormous bricks placed along them.  It looks the perfect vantage for a medieval  castle, not a grocery store.  I tried really hard to get a good picture, but the only camera I had was my cell phone, which is really, really bad quality.

The trip aside from that was pretty non-eventful.  Catherine shopped, we didn’t buy any drapes, and I talked to AT&T about the upgrade to my phone plan I’m going to need when I buy a smartphone in December.  The Penny’s already had Christmas decorations up.  Christmas.  In October.  One of these days I’m going to head into a department store in March and there are going to be Christmas Decorations up.

Long Week

Well, it’s been another long week at work. My meeting with the Registrar’s of all four WSU Campuses went really well. All the work I’d been doing and the design I’d done was completely validated over the course of the discussion. This may not have been my first choice in Job, but it should be good for the next year. At least, once I get all this stuff for the posting in. All of the sudden, I need three letters of recommendation and an updated resume. The resume is easy enough, but I’m having the ask around quite a bit for those letters. Luckily, I’ve got several people who should be able to get me letters by mid of next week.

New Hardware

I've been looking at buying new computer stuff for a while. My current rig has always been a little unstable, and I'm really left wondering if my Motherboard isn't maybe a little screwed up. Lately, I've been rebooting at least once every few days without actually requesting it, my USB subsystem has certain devices that it will just decide to stop reading, usually while locking up my USB Mass Storage subsystem. Luckily it doesn't lock this up so bad that I can't do a proper restart, but a proper restart is required. I've tested the memory, and it seems to be fine, so I'm really at a loss.

Still, it's been about 5 years since I upgraded last (at least, it was a year or two before I met Catherine), so I think it's about time. Plus, with the recent drop in prices on CPUs, it seems like a really good time to upgrade. The research that I've done suggests that AMD is probably the best choice for desktop processors right now, when it comes down to power, efficiency, and price, and I've always liked their chips so that's who I'm sticking with. Keeping that in mind, I think it's best to start construction with a solid processor.

The AMD Athlon 64 X2 6000+ 3.0GHz Socket AM2 Processor is currently my weapon of choice. Multi-core was my only major requirement, and part of why I was attracted to the AMD chips was that teach core had it's own cache, unlike the shared cache on the Intel Core 2 Duo. In my opinion, the shared cache was potentially a performance bottleneck on desktop PCs because where multiple cores were really going to shine will be in running unrelated processes, in which case a shared cache basically cuts your cache size in half. This processor was a little more than I'd originally planned, but I really think the few extra dollars will let this chip last another half-decade or so.

As I said above, I feel the real problem child in my case right now is probably the Motherboard. I don't remember the brand, but I think it was Biostar. Biostar's a good brand, but I think I may have just gotten a dud. One that worked just well enough that I didn't do anything about it. Of course, I think it may have been responsible for at least one hardware failure, in addition to that USB Mass Storage problem. But, I've decided to go with Biostar again, on their TF560 A2+ ATX AMD Motherboard The only downside is the lack of Firewire support, but I don't have any Firewire devices, and if I really needed it, the Mac has it, so I don't consider that a big loss for my desktop PC. Supporting up to 4 GB of RAM, a ton of USB connections, and sporting that one serial port, I think I'm going to be happy with this board, and the Crucial 2GB DDR2 800 Desktop Memory should be a nice accompaniment. The only worry I have about this motherboard is the strange "additional power connector" that's on the diagram. I'm not sure my current Antec True 480 has a connector for that.

Finally, to wrap the whole thing up, I needed a PCI-Express 16x Video Card. An ASUS GeForce 8600GTS looks pretty nice, plus it's DVI-only with two DVI ports. Now, if only I had money to replace this old CRT... At least the Mac came with the DVI->VGA Adapter, so I don't need a new one of those.

If anyone has any comments about what I'm looking at purchasing, I'd love to hear them. I've priced this out to about $450, plus any shipping, which I think is pretty reasonable. I'm planning to order probably within the next few days. It'll be nice to have a stable computer for once. I can't wait to know that when a bit of software crashes it won't be because of wonky hardware.

Update: So, i guess that special power connector is important, otherwise my fancy new Video Card wouldn't work. I'm going to stick with Antec, but I think I can easily get away with a 430W PSU instead of the 480W I'm running now, since I don't have that much running off it. I'm looking at a Antec earthwatts EA430 430W, not the least because of it's apparent efficiency. If it guarantees 80% under heavy load, it should do much better under Normal loads.

When It Rains it Pours

Last week, Richard brought me into his office. I was being offered a position with the WSU Registrar’s Office. Really, it was just a one year posting in a job I was already doing, but I was really excited because with the posting, benefits were coming. That was on Monday. I assured Richard I was interested, but asked for a few day to check on some other potential opportunities.

On Friday, I finally got in touch with Biological Sciences at the University of Idaho, to check on their Computer Scientist job in IBEST. Unfortunately, it looks like they decided to interview other folks, so by Friday, I’d decided to accept the WSU position. However, Richard was out, so I had to wait until Monday to tell him.

Monday rolls around, Richard doesn’t get in until 9:30, and I let him know as soon as I see him. He’s still getting the paperwork written up, but there it is. By 1:30 in the afternoon, Schweitzer Engineering was on the phone with me. They had another position open (a real one this time), and wanted to talk to me. I told them the situation and that I needed a day to make a decision.

I thought about it all night. I hadn’t signed anything with WSU yet, so I wasn’t under any sort of contractual obligation yet. But, I’d told them I was accepting the position. As much as I’m interested in working with Schweitzer, I didn’t feel it would be right to persue something so soon after accepting a posting. So, it looks like, starting November 1, I’ll have a one-year posting at Washington State University, as a Programmer for the Registrar’s Office.

Hooray for Boolean Algebra!

I knew when I enrolled in Introduction to Logic that the course was going to be usable long into my career, and in truth, I've done quite a bit of work with Boolean Algebra in almost every program I've ever written.  What I didn't expect to do again, was to write out, solve, and simplify pure boolean equations ever again.  But today, I spent a good portion of the day doing just that.

Here at WSU not every student receives a mid-term grade.  In fact, very few do.  The rules, expressed in English, were very simple:

  • Graduate Students don't get Midterm grades
  • Only for Fall and Summer Terms.
  • The Spokane Campus doesn't have mid-terms
  • Freshmen with less than 28 total hours
  • New transfers (in their first semester at WSU) excluding seniors (students with 90+ hours) and excluding Vancouver

However, when I dug into the code, I noticed that we weren't using the same equation every where to determine if a student was to receive a mid-term grade or not! Not only that, but I'm not sure that the equations in use were even correct! There were two places that I worked from when trying to fix this problem, our Class Lists application, and the Grading Submission Application.

ClassLists: ~GradStudent & ( NewTransferStudent | CreditsUnder28 ) & ~SeniorCredits & ( ( (Pullman|TriCities) & ~Summer) | (FreshmanStanding & Vancouver) )

GradeSubmission: (((Pullman | TriCities)) & ~Summer) | (Vancouver & FreshmanStanding)) & ( NewTransferStudent | CreditsUnder28) & ~SeniorCredits

Not only were these kind of a mess, but they weren't even equitable equations. So, I basically just threw them out and spent some time designing and testing a new equation. This included a few false starts as I tried to keep the equation as simple as possible, and didn't include all the rules I needed to initially, not to mention a few other projects needing direct attention, but eventually I was able to come up with, what I believe, is the simplest equation that fulfills our rules.

Here is what I came up with: ~Summer & ~GradStudent & ( ( ~(Vancouver | Spokane) & ( (FreshmanStanding & CreditsUnder28) | (NewTransfer & ~SeniorCredits) ) ) | (Vancouver & FreshmanStanding & CreditsUnder28) )

This Equation made me wish so, so much that Vancouver didn't have to be different from Pullman or the Tri-Cities. No such luck though, and I ended up with a more complex equation than I really wanted. Still, I think it's clearer than the old one, and since it's the same everywhere it's used, it should be easier to maintain. Plus, any new campuses will automatically be included, as long as they follow the 'standard' method of determine midterm eligibility.

Then, of course, was the minor challenge of substitution my Boolean Algebra equation with a SQL statement that amounted to the same thing. Most of my Boolean Variables could be represented very simply in SQL, as I had kept the data organization in mind when designing the equation, but the NewTransfer variable had to be represented by looking at 3 different fields in the database, which also required some boolean algebra itself. That term basically expanded to (NoWSUCredits & HasCredits). Actually, this wouldn't have been a problem at all, but the data is stored in the database as strings, which required me to do a lot of unnecessary CASTing. If the data had been stored correctly according to it's type, things would have been much easier. Maybe there is a fringe case that explains why it was organized the way it is, but I haven't seen it.

Transformers

Catherine decided we needed to go see Transformers last night at the second run theater.  I wasn't completely opposed. I wasn't a huge Transformers fan back when I was young, don't really know why.  I've never much liked Michael Bay's directing, either.  Armageddon was okay, but that was probably in spite of the direction.  Still, I wanted to see it, and I'm glad I did so when I wasn't paying $7 a head to do it.

Still, it was a Transformers movie, and that means there are certain things you can expect.  Fights between building-scale robots, the Decepticons having stupid names and hiding in weapons of war, the Autobots going out of their way to defend the innocent.  Optimus Prime being honorable to a fault, Megatron being bent on dominating the universe, Optimus Prime never backing down from a fight that he's always going to lose, and the Autobots willingness to sacrifice everything in order to defeat the Decepticons.

Unfortunately, it was also pretty standard Hollywood fare, meaning that there was a list of other things to expect.  At least one "Bush is dumb" joke would be made, Parents would be seen as a foolish nuisance, though the male lead is kind of a dork the love interest will be a supermodel who is able to see past their differences, the Black soldier will be the first to die, and we probably won't have any reason to care about that.

Overall, the movie was decent.  If you went in looking for Giant Robots fighting each other, and Robots transforming into things that are familiar to us, you could definitely expect that.  In my opinion, the fights were kind of a cop out, though, as the film would usually only show glimpses of the fight.  Sure there were a lot of glimpses, but it's almost as if the fights were trying to be put into the background.  Plus, the one death in the movie, Jazz, doesn't even have a good fight to go along with it.  Megatron just tears him in half.  Again, not that we have any reason to care about that.  Jazz had spoken jive maybe twice, and I think he was deliberately kept in the background most of the time.

Plus, despite Optimus Prime's willingness to sacrifice himself to destroy the life-giving box that Megatron so desired, he ends up not having to, as somehow the Hero manages to use to box to destroy Megatron.  I know a lot of people my age still remember the death of Optimus Prime in the 1980s movie to be a powerful moment.  I'm still split about whether or not I think Optimus Prime should have died in this movie.  Had he, Megatron probably would have destroyed the Earth, but Optimus Prime would have died a death that truly meant something, in stoppping the flow of the Decepticons across the universe.

In the end, this movie was just more Hollywood fare.  The Heroes triumph without any meaningful losses, no great sacrifice is required for victory, and all the good guys live happily ever after.  The movie was fun to watch, but I don't think it's anything more than a renter.  Watch it once and set it aside, and get the original Transformers Movie, where Heroes are truly Heroes.

Microsoft "open sources" .NET?

Looks like Microsoft has decided to release more source under their "look-but-don't-touch" license, the Microsoft Reference License. While this should be great for .NET developers who are busily debugging their applications, I see this as being nothing but trouble for the MONO project. Miguel de Icaza seems to agree, though maybe not as pessimistically as myself.

This trend towards openness at Microsoft, such as their working with Mono on Silverlight/Moonlight really makes me happy. However, I'm really afraid that this release is going to cause trouble for the Mono project. Sure, their contribute page says that "if you've looked at the source, we can't accept your patches", But with such a widespread release, how can you be sure? What will Microsoft do if a person contributes code to Mono, when they've seen the Microsoft.NET code? How will Mono protect themselves from this? I'm really curious, and I hope that this doesn't cause problems for Mono, but I agree with Miguel: Microsoft should go for a more liberal license. If you're going to release your source, be truly Open Source, not some lie of a facsimile.

Microsoft Hates Bug Reporters

At work yesterday, a printing error on one of our Websites was brought to my attention through our Tech Support guy. Basically, a single page in our Schedules of Classes wasn't printing correctly in Internet Explorer 7 (though it worked fine in Firefox). The page only print the Header and the Footer blocks, leaving the actual content blank. It was strange, because it was working fine on my computer, but not on the one used by our tech support. So, I buckled down to try to figure out this bug.

I set up our testing server to output the page with the Printing style-sheet, which if it was a problem in the CSS should have reproduced the error. It didn't. We did some further testing in Print Preview on IE7, and noticed that the error was only occuring when the scaling was set to "Shrink to Fit". Again, I was confused why the error wasn't apparent on my screen, as I shared Printer drivers with a machine where the error was occuring, and our IE versions were exactly the same. We proved that it wasn't a Microsoft Vista issue, as the error still occured on one of the Windows XP boxes in our office. At this point, I felt I had pretty conclusive evidence that this was a Bug in Internet Explorer 7, which clearly wasn't triggered very often.

I began my search to determine if the bug was already reported. The Microsoft Support Knowledge Base only contained two issues related to Shrink to Fit and IE7, and they seemed to be pretty specific to printing e-mail from Outlook and Outlook Express, and weren't "fails to render" errors. So, I began searching for how to submit a bug. I tried Microsoft Support's Contact Us page. I tried the Contact a Support Professional page. Which might have let me submit a report, but would have cost me (or my organization) $59.00 to even talk to someone about the Bug. Not acceptable. I understand that Microsoft wants to keep down the number of bogus bug reports, but hiring a team of triagers to filter through and translate the reports (for language issues) can't possibly that cost inneffective for Microsoft, especially since a good traiger could probably close most faulty bug reports without spending a lot of time on them. If the bug database was searchable and viewable by the public, that would help cut down on the faulty bug reports.

Next, I turned to Google. Surely, I wasn't the only person who ever wanted to report a bug with a Microsoft Product. What do I find? Not one, but two blog posts about people being completely unable to submit a bug report to Microsoft without jumping through a ton of hoops! Now, for us, this is only occuring on one of a hundred or so pages, and there is an easy work-around, so I'm not in favor of paying Microsoft to do their bug searching for them. One of the comments on that second Blog post suggested that "Microsoft was looking into improving it's Bug reporting processes," but given the amount of IM-style english used, I really, really doubt that the comment was from anyone who worked with Mircrosoft and certainly wasn't from anyone with the power to affect policy.

I am immensely glad that I don't buy Microsoft software, and that I've convinced my finacee to get an Ubuntu Linux powered laptop from Dell. I've often recommended Dell laptops to friends, and while the Linux offerings they have today don't fulfill my wants or needs, they're absolutely perfect for Catherine and her Molecular Phylogenetic processing. Plus, it's cheaper by a significant margin (~$700) that an identical hardware MacBook Pro. And that's with Dell Support.