A few weeks ago, I started working on a software package designed to make it easy to take notes. I won't go into the details here just now, but I do want to mention the key technology, which I hope will set my application apart: natural language text processing.
The idea, mostly unimplemented at this point, is to have the software take blobs of text (i.e., notes), run them through some fancy natural language analysis algorithms and automatically create relationships between notes in useful ways. So that, for instance, if I had written a note to myself saying, "Never buy Skippy peanut butter again -- it was too sweet," then if I were to compose a shopping list that included, "Buy peanut butter," the software would automatically add an annotation to the shopping list to remind me what sort of peanut butter not to buy.
Last night, I thought I had a first early, version working reasonably well. I've been using some freely available sentence parsing software with my own additions to analyze blocks of input. It was far from perfect, but it did manage to create most of the basic relationships I though it should. So I decided to test the system against some "real world" inputs by running some publicly published to-do lists from tadalists through my application.
It robbed me of my confidence. Not everyone it seems composes their notes using scrupulously correct grammar. In particular, most people omit the article before the nouns in their to-do lists, which really tripped up the sentence parser I was using.
This presents at once a huge challenge and an interesting learning opportunity. I really don't have a choice but to write my own sentence parser, to turn "The cow ate some grass" or "Buy box of dishsoap" into a tree structure that an unintelligent computer can understand. Even in the summer, it seems, I will spend a not insignificant amount of money on academic texts.
No TrackBacks
TrackBack URL: http://www.kibeland.com/cms/mt-tb.cgi/281
Leave a comment