Assorted Afflatuses

June 2009

Brain Teaser

By Joseph Kibe on 29 June 2009 11:00 AM

In the course of some research on the Internet, I came across a San Francisco-based firm called Loomia, which provides companies with software to do those "If you liked this, then you'll also like..." blurbs on their websites. Curious about the company, I poked around their website to learn more, which led me to their "Tech Challenge Questions" for prospective employees.

Some of the questions were quite clever, though I really liked one in particular, which I will post here, with some modifications in the phrasing.

Challenge: Given the set { a, b, c, ... , z }, the set of lowercase letters of the standard English alphabet, we then define the operation $. Suppose we only know that

a $ b = ab = a
bc = c
cd = g
ef = u
ee = q
wy = i
wn = a
pq = g
rm = w
zc = y

and that the operation $ is associative with the identity element b. What, then, are the values of faulkner, oconnor and welty ?

It's not a particularly difficult to reverse engineer the operator, but I thought it was amusing enough to merit my repeating it.

More on Healthcare

By Joseph Kibe on 28 June 2009 9:48 PM

Professor Greg Mankiw, author of my favorite introductory economics text, had a great piece in this morning's New York Times about the economics of the president's proposed healthcare reform package. Read it here.

The Other You-Know-Who

By Joseph Kibe on 26 June 2009 6:18 PM

Followers of popular culture no doubt associate Joanne Rowling's Lord Voldemort with the pseudonym "you-know-who." Over the last 48 hours, I have adopted the sobriquet to reference a certain recently deceased pop star without actually mentioning the aforementioned pop star's name. (I don't want to give this person any more direct attention.)

But to the real issue. Earlier today the US House passed a monumental — if imperfect — piece of climate change legislation by a narrow margin, just 219 votes for the bill to 212 votes against. Yet, if one visits The New York Times website or the BBC News homepage, articles about you-know-who continue to take center stage.

This really, really bothers me. On the one hand we have a story about a piece of legislation, which represents a monumental step forward in the US policy toward climate change. Moreover, the legislation, if enacted, would have an enormous impact on citizens and corporations. On the other we have the death of an relatively influential person. I don't see how the later trumps the former in terms of its importance.

More Endorsements

By Joseph Kibe on 26 June 2009 5:32 PM

While I have no data to back myself up on this, I at least feel like what I write on this blog is more often negative than positive. Without a doubt, I can be a very critical person, but it's not as if I have a reason to dislike everything around me!

So here are some additional endorsements.

I joined Twitter back in October, and I've been a fan ever since. It's simple, elegant and just generally wonderful. What makes Twitter even better, of course, is the universe of available applications that leverage Twitter's openness. On the desktop, I tried TweetDeck, but found its footprint on my computer bothersome. I prefer a piece of software called Tweetie from the fine folks at atebits. (A very clever name for a software company.) On the iPhone, I like Twitterrific.

Also, I recently had a chance to try out the Rosetta Stone language learning product. After playing with the French package, I'm convinced it's the best way to at least begin learning a second language, short of total immersion in another country or specialized classroom setting. In particular, I was quite impressed by the software's voice recognition capabilities, which did an excellent job of rejecting my input when I gave my French a strong American accent.

Tentative Victory

By Joseph Kibe on 24 June 2009 10:45 PM

I'm not a big sports person. But I've always wondered why a) Americans do not like soccer as much as people in other countries and b) why America has such a terrible national soccer team.

I was in France for the bulk of the 2006 World Cup, and, upon learning I was an American, few people missed the chance to mock the performance of the US team in that competition. (Recall that the US only managed to tie in its match against Italy when one of the Italian players inadvertently nudged the ball into the Italian goal for the Americans.)

So I must take this moment to congratulate the US team for their remarkable 2-0 victory against top-ranked Spain at the Confederations Cup in South Africa yesterday. It may compel me to actually watch the final round of the Cup later in the week.

Worser Grammar

By Joseph Kibe on 24 June 2009 9:48 PM

A few days ago I mentioned my effort to write some methods for automated sentence parsing. Specifically, I had the goal in mind of creating sentence parsing algorithms that could still do a reasonably good job of text parsing even if the author omits common grammatical structures.

It turns out, writing such a set of algorithms is no walk in the park. In particular, I now see why the software I had been using failed so miserably when I threw it sentences without some key parts of speech.

At least in English, many words fall into one or more types of speech. For example, the word "cue." It could be a noun — a implement for playing pool, a signal, a hint — but it could also be a verb as in, "He cued the tape for playback," depending, of course, upon the context.

The context, as I understand it, plays a particularly important role in some algorithms. This class of parsing methods look at all the word pairs in a sentence and assign a part of speech accordingly. For instance, given the phrase, "a yellow duck," the parser would figure out that "yellow" cannot modify the verb "duck" (as in "duck and cover"), so it's likely "duck" is a noun and "yellow" and adjective.

Of course, this approach also failed rather miserably when I subjected it to real world inputs. The two algorithms I tried depended upon the presence of determiners in many cases to act as sort of "reference points," since, for example, "the" is only ever a determiner. This then enabled the algorithms to make good assumptions about the location of nouns, which in turn forces other words to be verbs, which more or less makes everything fall into place quite nicely.

But as I said, that didn't work. People don't write notes with sufficiently polished grammar to make such approaches work. (Though if I ever need to parse well-written work, I have code that does a pretty good job.)

So I'm trying my own heuristically motivated approach using word frequency data. While I'm working on some fancy probabilistic mumbo jumbo that involves a lot of math, at its core the approach is quite simple.

Take the word "young" as an example. I suspect most English speakers would immediately classify "young" as an adjective, which is true — most of the time. "Young" can also be a noun, as in, "The young were spared the worst of the battle's ravages." But, by analyzing a whole bunch of English writing, it quickly becomes clear that "young" is used far more frequently as an adjective than as a noun.

Thus, my algorithm takes that data and makes some initial assumptions when it looks at the words and phrases in a sentence. My hope is that, with these reference points in place it will become possible to make good guesses about the rest of the parts.

I also broke down and ordered a copy of Foundations of Statistical Natural Language Processing just to give myself a touch more background on the use of probabilistic methods in natural language parsing.

It's exciting, interesting and — most of all — incredibly frustrating.

Good Reads

By Joseph Kibe on 22 June 2009 10:41 PM

As part of my effort to post to this blog more frequently, I figured it might be a good idea to post some shorter pieces too, as most normal bloggers seem to.

So I have two recommendations for online reading and one book to plug.

First, I've long been a fan of Piled High and Deeper (or PHD) comics, the grad student comic strip. While I'm not a grad student (yet), I really empathize with the fictional students in the strip. It's also funny, if you're an academic geek like me.

Next, I would like to recommend The Art of the Title Sequence blog. It's a blog about movie title sequences, shockingly enough. While some title sequences — and movies for that matter — are banal and uninteresting, there are some movies with title sequences that merit extra attention.

Finally, I just finished reading P.D. James's latest novel The Private Patient. Literary merit aside, I enjoy reading mystery novels. P.D. James weaves the police procedural novel together with a more cerebral, psychoanalytical point of view that adds a great deal of depth. I also enjoy her subtle sense of humor.

Real People, Real Problems

By Joseph Kibe on 21 June 2009 11:43 AM

A few weeks ago, I started working on a software package designed to make it easy to take notes. I won't go into the details here just now, but I do want to mention the key technology, which I hope will set my application apart: natural language text processing.

The idea, mostly unimplemented at this point, is to have the software take blobs of text (i.e., notes), run them through some fancy natural language analysis algorithms and automatically create relationships between notes in useful ways. So that, for instance, if I had written a note to myself saying, "Never buy Skippy peanut butter again -- it was too sweet," then if I were to compose a shopping list that included, "Buy peanut butter," the software would automatically add an annotation to the shopping list to remind me what sort of peanut butter not to buy.

Last night, I thought I had a first early, version working reasonably well. I've been using some freely available sentence parsing software with my own additions to analyze blocks of input. It was far from perfect, but it did manage to create most of the basic relationships I though it should. So I decided to test the system against some "real world" inputs by running some publicly published to-do lists from tadalists through my application.

It robbed me of my confidence. Not everyone it seems composes their notes using scrupulously correct grammar. In particular, most people omit the article before the nouns in their to-do lists, which really tripped up the sentence parser I was using.

This presents at once a huge challenge and an interesting learning opportunity. I really don't have a choice but to write my own sentence parser, to turn "The cow ate some grass" or "Buy box of dishsoap" into a tree structure that an unintelligent computer can understand. Even in the summer, it seems, I will spend a not insignificant amount of money on academic texts.

The Public Plan

By Joseph Kibe on 18 June 2009 4:35 PM

Healthcare reform has moved back onto the nation's political radar. Earlier in the week, the President expressed his support for the so-called "public option" in a speech to the American Medical Association. The AMA, of course, is one of the leading voices in opposition to such a proposal.

If a healthcare reform package were to include this public option, then, as I understand it, the government would allow all citizens to opt-in to an alternative health insurance system operated by the government. The provision's proponents claim that this benefits consumers because it creates more competition in the market for health insurance, thus driving down prices and driving up quality. But this is lunacy.

First, if the government creates a health insurance product that abides by the same rules and regulations, and is constrained by the same economic realities as all the nation's other health insurers, I hardly see how the government plan would bolster competition. The government has no particular advantage when it comes to providing health insurance compared to any other provider. A public healthcare plan would be nothing more than one more homogeneous choice, neither much better nor much worse than the other options available to consumers.

On the other hand, if the government decides to subsidize its healthcare product, then it is quite probable — perhaps even certain — that the public plan would offer consumers a bargain on face. Of course, if the government subsidizes its plan, then it doesn't create competition in the true sense of the word. The government would have a cost advantage available to no other provider. In fact, many of the same groups in favor of the public option have called similar arrangements anti-competitive in other circumstances.

The only real benefit from healthcare reform would come about if whatever changes the government make have an impact on the portability of coverage. In the long run, we will pay just as much for healthcare if we pay for it through a single-payer government run system or through our present arrangement, where groups and individuals buy from privative insurers. Portability, on the other hand, would reduce a worker's cost of moving from one job to another, which allows labor to be reallocated more quickly — a very good thing indeed.

In short, the a public plan would do nothing but give customers one more choice in an already crowded market. The only real economic benefit would come about if we reduce costs and make coverage portable. And if that requires a single payer system, that's fine with me.