Does MS Word do Named Entity Recognition?

| | Comments (17) | TrackBacks (0)

Named Entity Recognition or NER as it is commonly known within the NLP community is the task of figuring out named entities such as names of places, organizations, people or other entites such as dates and timezones from free form text. NER has its uses in a lot of places from improving search results to performing tasks that are critical to the enterprise such as automatic summarization.

For example, in the following free form text from Yahoo! News

John McCain's presidential campaign on Wednesday released a withering television ad comparing Barack Obama to Britney Spears and Paris Hilton, suggesting the Democratic contender is little more than a vapid but widely recognized media concoction

Knowing that entities like John McCain, Barack Obama, Britney Spears and Paris Hilton are names of people helps us understand the text better. It adds more semantic information to text mining engines.  Also, knowing other entites such as Wednesday is a day could potentially be useful in building systems that can say generate an automatic summary of the political season in the United States.

Now, I was recently writing up a short paper for an upcoming conference and happened to paste some text from MS Word into a HTML text box. And guess what? MS Word seems to have atleast a basic named entity recognition engine.

Consider this example

Mountian View, CA is at the heart of Silicon Valley and is home to a lot of internet companies and startups such as Google, Meebo.com and SmugMug.com.

The Microsoft XML generated

<?xml:namespace prefix = st1 /><st1:City w:st="on">− <?xml:namespace prefix = w /><w:r><w:t>Mountain View</w:t> </w:r></st1:City>− <w:r><w:t>, </w:t></w:r>− <st1:State w:st="on">− <w:r><w:t>CA</w:t> </w:r></st1:State>− <w:r><w:t>is the heart of </w:t></w:r>− <st1:place w:st="on">− <w:r><w:t>Silicon Valley</w:t> </w:r></st1:place>− <w:r>− <w:t>and is home to many internet companies such as Google, Meebo.com and SmugMug.com. </w:t></w:r>

 As one can see, even though it is not able to recognize Google, Meebo and SmugMug as organizations, the fact that it can handle places accurately seems to suggest that its probably nothing more than a simple regex.  Nevertheless, that Word 2003 can do these things and is atleast five years old, when most of this technology was pretty much in its infancy is pretty darn good. I'm not sure how advanced this peice of technology is in the latest version of Word but seeing NLP techniques in places you don't expect to see is a good sign that we are moving to more intelligent software.

Signing Off,

Vishnu Vyas

0 TrackBacks

Listed below are links to blogs that reference this entry: Does MS Word do Named Entity Recognition?.

TrackBack URL for this entry: http://blog.vishnuvyas.com/mt-tb.cgi/153

17 Comments

ZNbmqi g9dR27dnaQkPp5sbn

Very Good Site fucktube 837212

this post is fantastic adult website like xtube 8PPP

I love this site lolita tube 004

good material thanks gay yobt touwjy

real beauty page xxxporntube khcuq

It's serious youporn 8-(

Thanks funny site Pornhost animal sex :-[[

Thanks funny site sexygirls 0871

Wonderfull great site hotteens >:-PPP

Very funny pictures preteenporn 33171

very best job youporn boysfood 5166

Thanks funny site cliphunter forum 8-(((

very best job movies maxporn vfnf

It's funny goodluck motherandsonsex %-DD

It's funny goodluck motherandsonsex %-DD

It's funny goodluck motherandsonsex %-DD

Leave a comment