Friday, February 06, 2009

Naive Bayesian Classification

I was naive. I thought Naive Bayesian Classification would help. I coded up a rough-and-ready version of it. I gave it 30% of my data as training, then see what it did with the remaining 70%. I have 7 categories. So, if I used a really naive classifier (say, something like this), I'd get 14% correct answers. My classifier got 4%.

I think the classifier is failing because it's a really small vocabulary: about 400 words are used total, and there is pretty big overlap. I'm not sure if more data would help. The context has a highly regularized vocabulary (think air traffic control, or emergency dispatch).

But I really enjoyed the exercise.


Post a Comment

<< Home