To-do from CCSC-NW 08

Here is a list of the potential changes to SVMTrainer that were suggested to me during this weekend’s conference.

Searcher

  • Implement conditions on acceptable web document sizes to optimize document retrieval time
  • Try using a small initial search as a seed to get other search terms and expand the diversity of my training set – Yahoo! Term Extraction might be good for this, too.

WordFilter

  • Try implementing WordNet in the WordFilter class
  • Find a use for Yahoo! Term Extraction

WebDocument

  • Implement parallelism in the retrieval of search results and the retrieval of web documents
  • Implement a document retrieval timeout and a URL blacklist to prevent hanging on bad downloads

Other

  • Investigate the use of SVMstruct for categorization/ranking problem in multiple dimensions
  • Start doing an independent check on the accuracy of trained sets by keeping 10% of results for categorization rather than training
  • Learn about Xi Alpha estimates and what exactly they mean
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s