Where did our search go?
March 12: Sphider, our search engine, died. The latest set of countries took the number of dynamic pages over the 150,000 mark. Sphider’s indexer wasn’t designed to handle anything that big, and it couldn’t complete a full re-index of the site. Sphider also supported only single-threaded indexing and with that many pages (and we expect to have nearly 300,000 pages when we’re done adding countries later this year), the time to re-index ballooned from about 18 hours to nearly 130 hours (estimated, since the indexer never finished). That’s too long for our needs.
So we temporarily removed the search boxes for everything but the blog area. Our plan is to try to have a replacement in the next couple of weeks. We think we’ve identified at least one open-source search engine that will work well for us, but the proof will be in our test box as we try to get it to run a full index in under a day without wiping out our box’s ability to serve pages to users.
We’ll post an update to this blog entry when we have a replacement in place.
March 21 UPDATE: We finally have DataparkSearch Engine in place. It’s a native Linux application that runs as a CGI. Searches are fast, indexing is still slow, but I can now see that indexing speed is a function of the VPS on which this site is hosted. However, unlike Sphider, I don’t think we’ll need to do full re-indexing every week to catch any new pages. We’ll see. I’ll post another update in a few weeks after we see how the search engine is doing.
• Go To FreelanceLocalTech •
