Applied Natural Language Processing: Projects
Applied Natural Language Processing: Projects
Applied Natural Language Processing: Projects
Lecture 12
Projects
Barbara Rosario
Today
Special guest: Rob Ennals, Intel Labs
Berkeley
More project ideas
Next class
Finish up classification
Information extraction
2
Announcements
Tuesday October 20 assignment 4 due
5% more if submitted at least 24 hours in advance
Well accept late submissions if:
1) You havent submitted late a previous homework
And
2) You let me know in advance (by the day before)
Thursday October 15 project proposal due
http://courses.ischool.berkeley.edu/i256/f09/assign
ments/project_proposal.html
1 page
General idea/topic
(If you know already) what kind of data/resources would
you like to use?
(If you know already) what methods do you think you'll
use?
3
Projects important dates
Thursday Oct 15: Proposal Due
Thursday October 22: Receive Feedback on Proposal
Thursday October 29: Turn in revised proposal (if required)
Thursday November 12: Check point (more information
later)
Dec 1 and 3: Class Presentations
Thursday Dec 10 (subject to change): Final Project Write-
up due
4
Rob Ennals
5
Project ideas
Whatever you like and are interested in!
Ideally, it should have at least one of the
following elements:
Interesting, novel application and/or data
i.e. topic classification for reuter wouldnt count.
Twitter?
New algorithm
Then you can use reuter data
Linguistic analysis
To inform the NLP! (i.e. analysis to be useful to a NLP
algorithm task/algorithm)
Implementation for novel use (iPhone?)
6
Scaling Up to Large Datasets
System calls to external software
Python is not able to perform the numerically intensive
calculations required by machine learning methods
nearly as quickly as lower-level languages such as C.
On large datasets, you may find that the learning
algorithm takes an unreasonable amount of time and
memory to complete if you use the pure-Python machine
learning implementations
NLTK's facilities for interfacing with external
machine learning packages.
Once these packages have been installed, NLTK can
transparently invoke them (via system calls) to train
classifier models significantly faster than the pure-Python
classifier implementations.
See the NLTK webpage for a list of recommended 7
machine learning packages that are supported by NLTK.
Software
If you need some fancy (i.e. expensive)
software, let me know asap
I may be able to buy it and let you use it for
the projects
8
Final Project Ideas
NLP with me all the time: Interfaces 90% useful
90% of the time
What are the NLP problems for a speech
interfaces that is always with me?
Take an audio recorder with you for a whole day.
Record all the speech commands you would
give to your perfect interface
Call mike
Write this message to sally hi sally movie tonight?
Remind me to buy milk when I go to the store
Put dentist on tue on the calendar
Where can I buy a bluetooth device nearby?
Set facebook status class today sucked glad is over 9
Twitter class today sucked glad is over
NLP with me all the time
Analysis
Analyze the commands
How many types of actions/classes?
What NLP apps (translations? extractions, etc)
Call [Mike]: action/class = phone, argument = Mike
NLP tasks: classification and extraction
Set Facebook status [class today sucked glad is over]:
action/class = facebook, argument = [class today sucked
glad is over]
NLP tasks: classification and extraction
Build a NLP algorithm for this data
10
NLP with me all the time
11
Final Project Ideas
NLP summarization for audio interfaces
Summarize email, blogs, news article
Different lengths or incremental (tell me more,
or tell me less get to the point!)
(Are audio summaries different from written
ones?)
12
Final Project Ideas
Intel Reader
To assist people with various disabilities
(blindness, dyslexia)
The Intel Reader performs text-to-speech
(TTS) on captured images (with OCR) and
downloaded text files
13
14
Intel Reader
Text to speech: Improved Speech Output
Contextual Pronunciation
TTS engines still relatively poor on
context-based pronunciation variations
Examples: LIVE LEAD
I live in California vs.
I watched the live performance of the concert
That battery is made from lead vs.
I will lead the troops into battle
15
Final Project Ideas
Two NLP problems for Intel Reader
Contextual Pronunciation
Identify words that have ambiguous
pronunciation
Choose the right pronunciation
OCR errors
Identify words that are mistakes (o-c, miso,
misc)
Choose the right words
16
Final Project Ideas
Blog analysis
Categorize blog topics (maybe including link analysis)
Segment blogs into pieces based on topics
Do blog author analysis
Summarize blog reaction to some event, e.g., what
did people think of An Inconvenient Truth
There is a contest on this:
http://www.icwsm.org/
17
Final Project Ideas
Create a Negativity/Emotion/Flame
Recognizer
There is some related work, but this is
somewhat under-explored
Emotions in email, blogs, facebook statuses
18
Previous Final Project
HomeSkim (2005)
Chan, Lib, Mittal, Poon
Apartment search mashup
Extracted fields from Craigslist listings
http://www.ischool.berkeley.edu/programs/masters/projects/2006/homes
kim
Orpheus (2004)
Maury, Viswanathan, Yang
Tool for discovering new and independent recording artists
Extracted artists, links, reviews from music websites
http://groups.sims.berkeley.edu/orpheus/demo/orpheus_demo.swf
Breaking Story (2002)
Reffell, Fitzpatrick, Aydelott
Summarize trends in news feeds
Categories and entities assigned to all news articles 19
http://dream.sims.berkeley.edu/newshound/
20
21
HomeSkim Craigslist Analysis
22
23
24
25
26