Named Entity Recognition or NER as it is commonly known within the NLP community is the task of figuring out named entities such as names of places, organizations, people or other entites such as dates and timezones from free form text. NER has its uses in a lot of places from improving search results to performing tasks that are critical to the enterprise such as automatic summarization.
For example, in the following free form text from Yahoo! News
John McCain's presidential campaign on Wednesday released a withering television ad comparing Barack Obama to Britney Spears and Paris Hilton, suggesting the Democratic contender is little more than a vapid but widely recognized media concoction
Knowing that entities like John McCain, Barack Obama, Britney Spears and Paris Hilton are names of people helps us understand the text better. It adds more semantic information to text mining engines. Also, knowing other entites such as Wednesday is a day could potentially be useful in building systems that can say generate an automatic summary of the political season in the United States.
Now, I was recently writing up a short paper for an upcoming conference and happened to paste some text from MS Word into a HTML text box. And guess what? MS Word seems to have atleast a basic named entity recognition engine.
Consider this example
Mountian View, CA is at the heart of Silicon Valley and is home to a lot of internet companies and startups such as Google, Meebo.com and SmugMug.com.
The Microsoft XML generated
<?xml:namespace prefix = st1 /><st1:City w:st="on">− <?xml:namespace prefix = w /><w:r><w:t>Mountain View</w:t> </w:r></st1:City>− <w:r><w:t>, </w:t></w:r>− <st1:State w:st="on">− <w:r><w:t>CA</w:t> </w:r></st1:State>− <w:r><w:t>is the heart of </w:t></w:r>− <st1:place w:st="on">− <w:r><w:t>Silicon Valley</w:t> </w:r></st1:place>− <w:r>− <w:t>and is home to many internet companies such as Google, Meebo.com and SmugMug.com. </w:t></w:r>
As one can see, even though it is not able to recognize Google, Meebo and SmugMug as organizations, the fact that it can handle places accurately seems to suggest that its probably nothing more than a simple regex. Nevertheless, that Word 2003 can do these things and is atleast five years old, when most of this technology was pretty much in its infancy is pretty darn good. I'm not sure how advanced this peice of technology is in the latest version of Word but seeing NLP techniques in places you don't expect to see is a good sign that we are moving to more intelligent software.
Signing Off,
Vishnu Vyas
ZNbmqi g9dR27dnaQkPp5sbn
Very Good Site fucktube 837212
this post is fantastic adult website like xtube 8PPP
I love this site lolita tube 004
good material thanks gay yobt touwjy
real beauty page xxxporntube khcuq
It's serious youporn 8-(
Thanks funny site Pornhost animal sex :-[[
Thanks funny site sexygirls 0871
Wonderfull great site hotteens >:-PPP
Very funny pictures preteenporn 33171
very best job youporn boysfood 5166
Thanks funny site cliphunter forum 8-(((
very best job movies maxporn vfnf
It's funny goodluck motherandsonsex %-DD
It's funny goodluck motherandsonsex %-DD
It's funny goodluck motherandsonsex %-DD