Search Limits

Today I learned about some under-documented limits on Google’s AJAX search API.  While working on my Searcher class (that will eventually generate training sets for the SVM) I asked Java to print the first 50 page titles that Google returned.  Every time I ran the program I would get a JSONException after 28 results.  Upon further examination, I found that Google returned the following 400 Bad Request JSON whenever I sent a request with the parameter &start greater than 28:

{
"responseData": null,
"responseDetails": "out of range start",
"responseStatus": 400
}

This seemed a little absurd, considering that in previous queries Google claimed to have found over 14 million results for the same search terms. Naturally, I started digging online to see if anyone else had encountered this magic 28 barrier. I soon learned that the AJAX search API is limited to 32 results, and that in order to get all 32 you must include the &rsz=large directive in your request, dictating 8 results per request instead of 4.

This could really hinder the quality of my training sets. I suppose I can just add results to the 100 most recent for each category (I wrote a nice little class to do just that) but then it could take a while to build a diverse training set, several days even if the results changed every day. On the other hand, I read that Yahoo’s web search API offers up to 1000 results with a cap of 5000 queries in 24 hours. Switching to Yahoo might be a good option, if their results are kept as up-to-date as Google’s. I’ll have to do some research, or maybe make the search interface modular so I can try both.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s