Yahoo! BOSS’s sites param gives us great flexibility in creating vertical search engines. However, we are limited by the number of characters. Here are some tips in keeping the length as short as possible.
- Forget passing subdirectories: foo.com/bar is considered the same as foo.com. BOSS will differentiate between subdomains bar.foo.com is not the same as foo.com. In a real-world example passing finance.yahoo.com/news will be interpreted as finance.yahoo.com. But finance.yahoo.com will give a different result than sports.yahoo.com.
- Remove www from the url. This is just wasting space. There may be an exception when the site was not setup to work without the www subdomain. I doubt this would make an impact.
VN:F [1.7.5_995]
Rating: 0.0/5 (0 votes cast)
VN:F [1.7.5_995]
I recently did a presentation in London with Skills Matter about Yahoo! BOSS. The small group was filled with ideas about extending BOSS functionality. I wrote a new post for the Yahoo Developer Network that expands on some of these concepts: Make BOSS More Dynamic.
The post discusses the idea of generating the “sites” argument, which tells BOSS to limit the results to a specified list of web sites, dynamically for each query. This allows each query to determine what sites are experts and then create a result set based on those experts.
I have built a prototype and will release it this week after I have a time to clean up some of the loose ends.
VN:F [1.7.5_995]
Rating: 0.0/5 (0 votes cast)
VN:F [1.7.5_995]
I’m working on a new art search engine that features a large list of museums in the sites attribute. The url length is pushing the limits of the http char count, so I assumed my problems of spam sites infiltrating results was based on the string length.
A recent discussion on the BOSS users mailing list led me to discuss this with one of the BOSS engineers today and we discovered a more basic problem with my requests. I was sending extra commas in the string. Instead of this: moma.org,louvre.fr I had moma.org,,louvre.fr.
Sometimes you need to go back to basics to find your problems. Thanks to Mithun for helping me out.
I don’t know if this is the cause of Michael’s problems in the original mailing list request, but it is a good thing to remember.
VN:F [1.7.5_995]
Rating: 0.0/5 (0 votes cast)
VN:F [1.7.5_995]