Stemming and Multi Language

by Xavier Comments: 0

I received a question today on stemming and multi language. Basically, “why do we need multiple fields in our Solr in different languages and how do I test multi language stemming?”.

First of all, let’s explain what stemming is. Stemming involves reducing words to their stem (or base or root) during indexing and querying in an effort to improve recall.

For example, if a document includes the following phrase “Xavier walked to work every morning from Westside Parkway” and a user searches for walk then the results will correctly include the document that has walk. Read more!

On Getting Started with Agile

by Xavier Comments: 0

I was having a conversation today with a person that needed some help on teaching his PMs Agile. I had a very simple response, get them started by watching the excellent trainings available in Pluralsight.

So, the first time I told him was:

– Agile has proven a succesful methodology in software… when done right Read more!

Error Creating Cores in Solr 5.5

by Xavier Comments: 0

There are multiple ways of creating cores in Solr. It is very straightforward and one of the ways is by calling Solr’s REST admin with action=create and also you can do it via bin\solr.cmd, however you could run into a small issue. Let me explain quickly this scenario that you might run into.

First of all, you can create using solr.cmd with the following command:

bin\solr.cmd create -c <nameofthecore>

And a fresh new core is created, which echos back the call made: http://localhost:8983/solr/admin/cores?action=CREATE&name=othercourses&instanceDir=othercourses

Create core

So then what if you are curious and decide to make the call directly yourself: (of course, changing core name)


Well, it does not work!

Failed Create

The hint there is that it can’t find some resources, namely solrconfig.xml.  To solve this issue, you only need to specify what are the base configurations that you want to use. So the call would be:


And presto, you get your core! Little detail, but worth knowing what was missing







Monitoring a Solr Cluster

by Xavier Comments: 0

A couple of days ago I got asked, how do we monitor our cluster? Well, there are professional ways and other for the budget conscious deployment. Here are a few options that came to my mind:

  • You have the ping request handler which can be used to determine if a node is up and running – this is useful if you want to configure the load balancer to determine which nodes are responding
  • Additionally I’ve seen environments where a monitoring service uses several predefined queries that are issued at a predefined interval and will notify if no response is received. Something like but behind the firewall. I do not know which/if monitoring services you might have.
  • And there are more specialized tools, for example Sematext although some of them are more Linux friendly, so it is necessary to look for Windows counterparts if you don’t have Linux.
  • Also you can use the clusterstate.json (this would be the one from prod https:///solr/zookeeper?detail=true&path=/clusterstate.json) from Zookeepr which will tell you the state of the nodes. You just need to do a bit of parsing which can be done pretty easily with a bit of Json.Net which is easy to learn.
  • And regarding monitoring your cluster’s use in terms of queries done, you can definitively use the Solr logs and analyze queries.

Just a few thoughts I wanted to share.

How to Send Solr Optimize Command

by Xavier Comments: 1

There are times when you want to optimize your Solr index. But what is optimize and why do I care?

Optimize is similar to when you defragment your hard drive. Solr will create a new index removing any deleted documents. It is simply house keeping at its best.

I usually do a commit from the Admin UI, going to the overview tab. Solr Optimize


However, sometimes we might want to do it programatically, a good example being when you have a spell checker configured to build the dictionary on optimize. The url to optimize is very simple, here is an example with my localhost, just replace with your Solr



Notice how the # is removed from all REST calls vs when the Admin UI loads.

Happy optimizing!

Easy way to do a Solr Core Reload

by Xavier Comments: 0

Many times have I stopped and restarted Solr to reload a core, yes it is kind of a rookie way as you can always go to the Admin UI, Core Admin and reload Core.

But what if you wanted to have a really fast way of reloading your core?

Just do it via the admin handler!
http://{SOLR IP}:{SOLR PORT}/solr/admin/cores?action=RELOAD&core={CORE NAME}

You can even add it to your code and make a simple call or better yet use SolrNet via the admin functionality found below:


Easiest Way of Issuing Solr Commit Command

by Xavier Comments: 0

There are a couple of ways to trigger a commit command in Solr. The easiest way is via a URL:


(Replace localhost:8983 with your Solr url and )

But you can also commit using the Documents option from the Admin UI. Simply navigate to Documents, using this URL:


And select Solr Command (raw XML orJSON), adding the command


Solr Command







Submit Document! It just works. And if you are using SolrCloud, the command goes to everyone.

The NIH Syndrome in Action: Calling Solr’s REST API

by Xavier Comments: 0

As I mentioned in my previous post, I was having a chat with a search architect on C# and Solr and he was telling me how they were going to call Solr using its REST API, which IMHO is not the best way to go.

I recommended SolrNet and I stick with this recommendation because you do not want to reinvent the wheel. SolrNet has been built over the course of several years, there are plenty of people using it and they have had the time to understand the functionality that Solr provides and code it accordingly. The problem that you will face if you implement everything via REST calls is that you need to take into consideration all possible scenarios, which in a lot of cases does not happen, And that’s when exceptions start to occur.

In some cases it is not possible to use a library as there may be legal implications. But in other cases it is because of the NIH syndrome, or the not invented here syndrome.

Sometimes teams prefer to have everything created in house. In my opinion that may apply pretty well if you have a closed library as you are at the mercy of the company’s support and sometimes even their ability to profit and therefore continue to exist.

But if it is open source, like SolrNet, then don’t be afraid. You are in control, you can modify the code if need be – as with any open source, of course following the licensing terms – and definitively you can benefit from the experience of many others that have traveled the same road you are currently on.

But of course, be nice. If you make an improvement, please contribute back to the SolrNet community. Then we all mutually benefit.

Five ways meetings suck and how to make them rock!

by Xavier Comments: 2

Meetings are a double edge sword. On one hand they are very useful for brainstorming, communication within and between teams and in general when used appropriately it sparks collaboration; moving projects forward.

The problem lies when meetings are misused or abused which sadly sometimes tends to happen more often than not. And that’s why I want to tell you 5 reasons why meetings suck and how to make them rock!

#1 People use them to keep busy vs being productive: I’ve seen many times how people set up meetings because for them going to the meeting is the work. Even worse, a lot of meetings end with “let’s set up another meeting to continue the discussion”. This is very common among Project Managers or Product Managers. Just as a fun mental exercise, take a couple of minutes and think if you know a few. I do.

How to deal with it: Make sure that every meeting that you attend or control over has a real and very clear objective. Even more, make sure that it is important that this meeting takes place. If the objective is not important or required, then simply defer it until it is the right time.

#2 A 1 hour meeting is not a 1 hour meeting: Have you ever been in a 1 hour meeting with 15 other folks and wondered how much work you actually have to get done instead of just sit there waiting for the 1 hour mark or your 2 minute turn to talk. Also, they guy next to you seems to be thinking the same while playing Candy Crush or in the best case answering emails? Well it is much worse than that. A 1 hour meeting with 16 people is actually 2 man days of work lost forever, gone in time. I shiver just to think of how many dollars are wasted because of overcrowded meetings!

How to deal with it: Not everyone needs to be invited to every meeting. Make sure each person has a clear and known reason of why they are there and (friendly) kick out of the room anyone that doesn’t. Also, as the old adage says, “divide and conquer”. A very broad topic where too many people are involved can be broken down into smaller chunks with smaller groups. You only need one person to coordinate among the teams, and 1 is always much better than a committee. This is very common with Scrums that get abused and end up looking more like office parties. Bonus points if people bring food as that distracts even further!

#3 Meetings without agenda: Another reason why some meetings take much longer than expected is that there isn’t a clear agenda and objective. This leads to endless talking, going on tangents and again scheduling new meetings to continue the conversation. People also confuse a meeting agenda with just the subject of the meeting. For example, “Discussion on data” is not clear, it is broad and subjective. This scenario goes hand in hand when overall project vision is not very clear or there are many unknowns.

How to deal with it: Create a clear agenda with well defined points and if possible, use my favorite feature of Scrum: timebox. Parkinson’s law is the adage that “work expands so as to fill the time available for its completion” and this applies very well to meetings. If you don’t have a limit, people will use all time possible for a discussion. Timebox in a reasonable way and you may see wonderful results. And if the issue is overall project vision, try planning shorter term until there is a clearer view of the road ahead.

#4 Interruptions break flow for makers: If you work hard on improving yourself you are probably familiar with the concept of “flow”, but if you are not then interiorize this: “flow is the mental state of operation in which a person performing an activity is fully immersed in a feeling of energized focus, full involvement, and enjoyment in the process of the activity”. This is when you get a lot done. Developers usually experience this without knowing when “time flies”. But in reality what happened was that the brain was fully focused and engaged in a task with potentially wonderful results.

There is something else that you should be familiar with that is the “Maker’s Schedule, Manager’s Schedule”. Now let me explain. If you are a maker, i.e. programmer, you work most likely in large chunks of time, for example from 8 am to noon. During that time you focus on solving a problem and if you are interrupted, it may take up to 15 minutes to pick up where you left off. This means that if there are multiple meetings throughout the day, your maker’s schedule gets broken down into many small chunks that do not allow for focused full involvement into a task. I worked on multiple projects in Microsoft’s main campus in Seattle and it seemed like everyone in this particular division worked in 1 hour intervals which wreaked havoc on development work. Culprit: too many (project) managers in the same building. Real development took place somewhere else.

How to deal with the managers’s schedule:

Manager’s usually have smaller chunks of work that can be broken down into 1 hour slots, hence their impulse to “book meetings with the team to catch up”. They may have the best intentions, but this usually has a negative impact. Even worse, managers are higher in the food chain so let me ask you this, when was the last time that you simply cancelled a meeting with your boss because he was interrupting you?

How to deal with the maker’s schedule: The first step is for the manager to understand the impact he is having on his whole team with constant interruptions. This sometimes requires the maker to approach the manager and explain the implications of the constant interruptions. Then the manager should aim to schedule (required) meetings when they cause the least impact, which can be near start, noon or end of day. Also, use asynchronous communication. This sounds complex, but it is basically email, chat or even better, something like Jira to manage the makers progress.

#5 Another chance for talkers to take the stage, not doers: This is a very tricky one as it requires a leader that is aware of the capabilities and responsibilities of the team members attending the meeting. The problem usually lies in that human personalities are extremely different so some people tend to talk a lot while others prefer silence. And talking is not directly proportional to doing. Someone might talk a lot and do very little and the actual guy doing the work will just sit there and listen. This creates then an unbalanced and biased view as not all of the involved parties will communicate appropriately and the outcome of the meeting might not be the best.

How to deal with it: It is the meeting organizer’s responsibility to understand what is the involvement of each one of the parties invited to the meeting in respect to the meeting agenda. Then ask questions specifically on each one of the topics to get the correct information and help obtain the best possible outcome for the meeting.

In summary meetings can help your team leap forward. Just remember to schedule meetings only when absolutely required instead of as a way to keep busy, inviting only those that are definitively required, with a very clear agenda and objective, at times where they don’t interrupt the maker’s schedule and guiding the meeting to make sure that everyone contributes their share.

Any other recommendations?

Search is one of the most misunderstood functionalities in IT

by Xavier Comments: 0

There is a phrase I use all the time: “Search is one of the most misunderstood functionalities in IT”. And I think it is very accurate.

The problem lies in two different aspects:

  1. Developers don’t know how to use search engines. And it is ok, search engines can be hard to tune appropriately and it is a specialised niche. In some cases, there are some search engines which are awfully expensive.
  2. Developers are lazy. Let me explain this one.

Let’s say that I am setting up an application for selling cars. Potential customers always look for the same things, which are make, model, year, sort by price and so on and so forth. There is a set of meta data that is important and required to find what you are looking for. So what is the solution to this problem?

Use a database where each field is stored in a separate column and look for the fields accordingly, just like in the following image. It is a mistake or at least a UX horror. I hate database driven search, but that is just my personal opinion.

A typical database driven search input

The correct wat of doing it is by providing a single search box. How? Like this:

A proper search box

If you want to learn how, please click on the following link to my Pluralsight course to get started with enterprise search using Apache Solr!