Today I am configuring spell correction in Solr 5.5. Enabling it is not very hard. Simply select which spellcheck component you want to use, please see here for the alternatives: https://cwiki.apache.org/confluence/display/solr/Spell+Checking There are several but I selected solr.IndexBasedSpellChecker which works for what I need. I replaced the one that comes in the solrconfig and then added spellcheck as lastcomponents. Reindexed, committed and it works. Most people stop here, but I wanted to learn more, and so here is some very good recommended lecture to understand spellchecking better: Getting started Spell Checking with Apache Lucene and Solr Which references a more technical post http://norvig.com/spell-correct.html That goes even into more technical depth http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36180.pdf http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=52A3B869596656C9DA285DCE83A0339F?doi=10.1.1.146.4390&rep=rep1&type=pdf
There are multiple ways of creating cores in Solr. It is very straightforward and one of the ways is by calling Solr’s REST admin with action=create and also you can do it via bin\solr.cmd, however you could run into a small issue. Let me explain quickly this scenario that you might run into. First of all, you can create using solr.cmd with the following command: bin\solr.cmd create -c <nameofthecore> And a fresh new core is created, which echos back the call made: http://localhost:8983/solr/admin/cores?action=CREATE&name=othercourses&instanceDir=othercourses So then what if you are curious and decide to make the call directly yourself: (of course, changing core name) http://localhost:8983/solr/admin/cores?action=CREATE&name=othercourses&instanceDir=othercourses Well, it does not work! The hint there is that it can’t find some resources, namely solrconfig.xml. To solve this issue, you only need to specify what are the base configurations that you want to use. So the call would be: http://localhost:8983/solr/admin/cores?action=CREATE&name=othercourses&instanceDir=othercourses&configSet=basic_configs And presto, you get your core! Little detail, but worth knowing what was missing
Life is like a box of chocolates. You never know what you are going to get! A friend of mine, Katherine, volunteers ad honorem in a foundation called Lifting Hands that is aimed towards helping children from a very poor neighborhood in Costa Rica learn new skills and grow up as respectable members of society. The Request One day while we were talking about Big Data, Solr and the typical geek stuff we discuss all the time, she asked me if I wanted to go one afternoon and talk to her kids about what it was like to grow up to be a computer programmer and hopefully motivate them. It was two groups, 10-12 and 12-14 year olds. What I Thought Piece of cake. I am pretty good at presenting. I’ve done it in front of up to 850 people, spent years as a developer evangelist for Microsoft/Artinsoft and now I enjoy creating content as a Pluralsight author. People also tell me that I am good at motivating others to get into programming given the passion that I have for this field. So my answer was a quick yes. “What could go wrong?” The Briefing Katherine then sat down with […]
A couple of days ago I got asked, how do we monitor our cluster? Well, there are professional ways and other for the budget conscious deployment. Here are a few options that came to my mind: You have the ping request handler which can be used to determine if a node is up and running – this is useful if you want to configure the load balancer to determine which nodes are responding Additionally I’ve seen environments where a monitoring service uses several predefined queries that are issued at a predefined interval and will notify if no response is received. Something like http://www.site24x7.com/ but behind the firewall. I do not know which/if monitoring services you might have. And there are more specialized tools, for example Sematext although some of them are more Linux friendly, so it is necessary to look for Windows counterparts if you don’t have Linux. Also you can use the clusterstate.json (this would be the one from prod https:///solr/zookeeper?detail=true&path=/clusterstate.json) from Zookeepr which will tell you the state of the nodes. You just need to do a bit of parsing which can be done pretty easily with a bit of Json.Net which is easy to learn. And regarding […]
I had to look for empty values in a mandatory field in SOLR today. Wait, what? Shouldn’t mandatory values in the index should be marked as required=”true” when you are defining the field? Well yes, but some people forget to do it or maybe the spec was not fully completed at the time when they worked on the schema so they did not include it… just in case! (YAGNI definitively comes to mind) Well, in any case I had to find which documents did not have the publication date (which sounds like a really really really mandatory field to me). So how do you identify them? Option A: Query *:* and start paginating taking down notes of which documents do not have the value… Ok this is totally brute force approach. But I wouldn’t be too impressed if I find someone doing it. The things I have seen… Option B: Query *:* and in your fl include only id and publicationdate. Paginate or add enough rows. Very amateur but a bit better than before Option C: Query *:*, include only the two fields in fl and sort asc! Much better as in your results you will have the ones with empty at the beginning. Option […]
There are times when you want to optimize your Solr index. But what is optimize and why do I care? Optimize is similar to when you defragment your hard drive. Solr will create a new index removing any deleted documents. It is simply house keeping at its best. I usually do a commit from the Admin UI, going to the overview tab. However, sometimes we might want to do it programatically, a good example being when you have a spell checker configured to build the dictionary on optimize. The url to optimize is very simple, here is an example with my localhost, just replace with your Solr http://localhost:8983/solr/yourcore/update?stream.body=<optimize><query>*:*</query></optimize> Notice how the # is removed from all REST calls vs when the Admin UI loads. Happy optimizing!
Many times have I stopped and restarted Solr to reload a core, yes it is kind of a rookie way as you can always go to the Admin UI, Core Admin and reload Core. But what if you wanted to have a really fast way of reloading your core? Just do it via the admin handler! http://{SOLR IP}:{SOLR PORT}/solr/admin/cores?action=RELOAD&core={CORE NAME} You can even add it to your code and make a simple call or better yet use SolrNet via the admin functionality found below: https://github.com/mausch/SolrNet/blob/master/Documentation/Core-admin.md