Object properties – intrinsic vs extrinsic

From Slashdot | Are You Proud of Your Code?

That’s Not Good Code (Score:5, Interesting)

by severoon (536737) on Monday December 10, @03:31PM (#21646909)
(Last Journal: Tuesday September 14 2004, @03:59PM)

The biggest thing I see wrong with other people’s code is not at the syntax level or to do with commenting. It’s based on a misunderstanding of basic CS principles. An example is in order from last week…

In the codebase I work on, there’s a module that analyzes documents and tags them, assigning different weights to each tag based on relevance. (Think of the way google works–it reads a web page and tags it with terms, then if you type in one of those terms while doing a search that document comes up–yes I know this isn’t the way google actually works, save it. :-p ) At one point we send a list of tagged documents from one system to another, and this area was the source of many, many bugs. The responsible developer spent the better part of last week slaving on this code and every change seemed to introduce more bugs than it removed. Finally, I got involved to see what the problem was.

Here is one thing (out of many) that I found. After the docs arrive in the new system, each doc is supposed to be persisted along with the top 5 most relevant tags (the rest discarded). The code was written to create a Document, check a hash table that maps the doc to the number of tags it currently has, add a tag if that doc doesn’t yet have five, then increment the value associated with that doc in the hash table. When I saw this, my head almost exploded. At some point, this developer thought it would be a good idea to create a hash table and keep this information, information which is available in the document itself (he could’ve just called doc.getTags().size() to see how many tags it currently had). Now he created this hash table and all his code was written to depend on it, so of course he had to write a lot of code to keep it in sync with the state of all these documents.

This sounds like a simple enough thing, right? It’s not necessary, and it’s not the best thing, but it’s a fairly simple mistake and one that couldn’t impact code readability all that much, right? Maybe–but consider that this is one of about 10 simple mistakes I found, and you can imagine the explosion of interactions of all these simple mistakes…and that’s why we burned a person-week on something that should’ve been trivial. When I pointed out to this developer that he could just get the number of tags directly from the doc itself, and doesn’t need to keep this state in some other object too, he said something to the effect of, “That’s a different approach, but whatever…one’s not better than the other.”

But one is better. If this developer understood the difference between intrinsic and extrinsic, he never would’ve written that code in the first place much less defended it. To put a fine point on it: the number of tags associated with a document is intrinsic to the document itself, so that is where the information should live…not there as well as some hash table somewhere. The document is the authority and the final word on how many tags it has at any moment in time. (There’s a principle in databases called the SUA Principle–it means one should keep data in a Single place, in an Unambiguous manner, and that should be the Authoritative source of that data and no other. It applies here too.) Putting this info into some other object, even if that object exists solely for the purpose of tracking that info, means you’re creating an object that stores information that is extrinsic to it. Never a good idea…now you need a whole bunch of supporting code that keeps the extrinsic info in sync at all times.

Let’s say I’m designing a Ball class for use in a physics application that students learning physics can use. They can shoot the ball out of a cannon, put it under water, in deep space, on Jupiter, etc, and see how the simulation behaves. As the developer of this class, I decide to add a characteristic to the ball that keeps info about its “heaviness”. What should I add, a getWeight() method or a getMass() method? The developer I was talking to apparently wouldn’t see the difference–one can be derived from the other, so put both, or just pick one, whatever.

Actually, there’s a huge difference. getMass() is the right answer. Mass is intrinsic to the ball, it doesn’t change based on where the ball is. Weight is different on Earth, Venus, deep space, I’m not even sure what that value would be if the ball were under water and buoyant. Change the ball’s environment, the weight would have to be constantly updated to keep it in sync. Mass never changes. But let’s forge ahead anyway and add getWeight(). How could such a small design mistake cause big problems?

For one, other objects that previously didn’t need to know about Ball now do. Now something else in the system has to call setWeight() on when the ball needs an update. Or, we could design getWeight() to take an argument, say: getWeight(Environment e). Now the ball can figure out its weight by itself and return it. Problem solved right? Not really…now I created a dependency of Ball on the Environment class. A ball’s a simple class, I should be able to compile it by itself…not anymore, though. The compiler now needs access to whatever code defines the Environment object. In fact, it’s quite likely that the Environment object has to track the objects it contains, so now I’ve created a circular dependency. And on and on and on…

(By the way, I submit that the code block you’ve posted is not good code, at least if we consider Roedy Green’s opinion of Hungarian Notation. If you haven’t read this: How to Write Unmaintainable Code [mindprod.com] then you’re missing out.)

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Comments are closed.