Helping .NET Developers Understand Search & Big Data

Stemming and Multi Language

I received a question today on stemming and multi language. Basically, “why do we need multiple fields in our Solr in different languages and how do I test multi language stemming?”.

First of all, let’s explain what stemming is. Stemming involves reducing words to their stem (or base or root) during indexing and querying in an effort to improve recall.

For example, if a document includes the following phrase “Xavier walked to work every morning from Westside Parkway” and a user searches for walk then the results will correctly include the document that has walk.

Stemming is not perfect, because in some cases the algorithm will perform reductions that are not adequate, for example news -> new but the overall improvement in results means that applying stemming is much better than not applying it.

The next step to improve results will be to use lemmatization which reduces not to the stem, but to the lemma which is more advanced. For example, better would be reduced to good, thus having better results. This is achieved with dictionaries, but that is a much more complex process that should be taken at a later stage after the effects of stemming has been analyzed in depth because of added cost.

Multi Language

Now, going into multi language to answer the original question. Part of the features that can be added to a Solr implementation for multi language applications is the use of multi language fields that apply language specific rules to each field, including stemming. This means that separate fields are used for title, body and other fields so that each search applies language specific rules.

Let’s see one as an example and we will use the Analysis screen to show the results. As background, the Analysis screen is part of the Solr Admin UI and it shows you how words are treated at index and query time. If you want more information on the Analyzer, please head to Solr’s Wiki: https://cwiki.apache.org/confluence/display/solr/Running+Your+Analyzer

So here is an example of stemming in English

If you search for audits it will match audit. You can see by going into the Analysis screen and selecting English body field and typing audits into query and audit into index.

Audits Analysis

And these are the results, the one on the left being index side and the right is query side. As you can see, this would be a match!

audits

Now, let’s try doing this with a different language. I will use the word alquileres which is the plural of alquiler in Spanish. Stemming in Spanish should correctly reduce the word to alquiler but if I select a different language field, it should not because Solr is not applying language specific rules. And

alquileres analysis

And as expected, the rule is not applied correctly. Alquieres is stemmed to alquiere and thus it is diferent from alquiler.

alquileres_analyzed

 

But if I change to a Spanish field, namely text_es, now the Spanish stemming rules have kicked in, reducing alquileres effectively to alquiler which is correct.

Alquiler

And this should apply for all languages that have a language specific field. What is required to work with each language is an understanding of what are the stemming rules for each individual language.

Hope this helps!

 

 

 

On Getting Started with Agile

I was having a conversation today with a person that needed some help on teaching his PMs Agile. I had a very simple response, get them started by watching the excellent trainings available in Pluralsight.

So, the first time I told him was:

– Agile has proven a succesful methodology in software… when done right
– And it is just like when talking to someone from your team, unless that person is fully convinced that whatever you are telling them is going to work, they are not going to do what you say. This means, a person can take a lot of Agile courses and certifications, but unless this person is fully convinced it will work and follows the methodology, it won’t work as expected. You can spend thousands of dollars training people, but unless they interiorize it you might as well just use that money someplace else.
– And being Agile means you really need to be disciplined. Daily standups are…. daily. Timeboxing is critical. A stand up is a standup, not a requirements gathering or technical meeting.

Anyway, as a first step to get started with Agile I recommend taking the following courses in Pluralsight which cover Agile pretty well.

This is my suggested order: (click on each one to be taken directly to the training)

Configuring Spell Correction in Solr

Today I am configuring spell correction in Solr 5.5. Enabling it is not very hard. Simply select which spellcheck component you want to use, please see here for the alternatives: https://cwiki.apache.org/confluence/display/solr/Spell+Checking

There are several but I selected solr.IndexBasedSpellChecker which works for what I need. I replaced the one that comes in the solrconfig and then added spellcheck as lastcomponents. Reindexed, committed and it works.

Most people stop here, but I wanted to learn more, and so here is some very good recommended lecture to understand spellchecking better:

Getting started Spell Checking with Apache Lucene and Solr

Which references a more technical post

http://norvig.com/spell-correct.html

That goes even into more technical depth

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36180.pdf

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=52A3B869596656C9DA285DCE83A0339F?doi=10.1.1.146.4390&rep=rep1&type=pdf

 

 

Error Creating Cores in Solr 5.5

There are multiple ways of creating cores in Solr. It is very straightforward and one of the ways is by calling Solr’s REST admin with action=create and also you can do it via bin\solr.cmd, however you could run into a small issue. Let me explain quickly this scenario that you might run into.

First of all, you can create using solr.cmd with the following command:

bin\solr.cmd create -c <nameofthecore>

And a fresh new core is created, which echos back the call made: http://localhost:8983/solr/admin/cores?action=CREATE&name=othercourses&instanceDir=othercourses

Create core

So then what if you are curious and decide to make the call directly yourself: (of course, changing core name)

http://localhost:8983/solr/admin/cores?action=CREATE&name=othercourses&instanceDir=othercourses

Well, it does not work!

Failed Create

The hint there is that it can’t find some resources, namely solrconfig.xml.  To solve this issue, you only need to specify what are the base configurations that you want to use. So the call would be:

http://localhost:8983/solr/admin/cores?action=CREATE&name=othercourses&instanceDir=othercourses&configSet=basic_configs

And presto, you get your core! Little detail, but worth knowing what was missing

Created

 

 

 

 

 

My Toughest Crowd

Life is like a box of chocolates. You never know what you are going to get!

A friend of mine, Katherine, volunteers ad honorem in a foundation called Lifting Hands that is aimed towards helping children from a very poor neighborhood in Costa Rica learn new skills and grow up as respectable members of society.

The Request

One day while we were talking about Big Data, Solr and the typical geek stuff we discuss all the time, she asked me if I wanted to go one afternoon and talk to her kids about what it was like to grow up to be a computer programmer and hopefully motivate them. It was two groups, 10-12 and 12-14 year olds.

What I Thought

Piece of cake. I am pretty good at presenting. I’ve done it in front of up to 850 people, spent years as a developer evangelist for Microsoft/Artinsoft and now I enjoy creating content as a Pluralsight author.

People also tell me that I am good at motivating others to get into programming given the passion that I have for this field.

So my answer was a quick yes.

“What could go wrong?”

The Briefing

Katherine then sat down with me to explain everything that I needed to know. These were poor kids, from poor families, in a poor neighborhood (I definitevely got the point) and they were there by choice.

“By choice”.

Those words got stuck immediately in my mind. This was great. They were not forced to attend. This is usually a good sign. They must be motivated.

This was going to be easy.

The Message

So my next step was to outline what I was going to tell them. In my mind I was going to focus on how you can set a goal, put some (or a lot) of effort into it and then you most likely will either achieve your goal or at least improve your circumstances.

It sounded simple to me.

Day Zero

And so the day came and there I was walking into Lifting Hands, ready to help some kids. I got to the classroom and kids started to arrive.

Kids being kids naturally wanted to play Plants vs. Zombies and Mario Kart instead of being lectured by a stranger. They also preferred their regular and well known teacher.

So I decided to break the ice and get them to talk. I introduced myself and asked them the logical question.

“What Do You Want to Be When You Grow Up”

For me this was a question for which I was expecting the usual answers, something like fireman, nurse, perhaps a lawyer or two.

When I was that age I wanted to be an economist or stock broker. Ok… I was not the most “normal” kid. By third grade my reading consisted of books like Don Quixote, which was commonly used in 11th grade – and even then barely few teenagers actually read it,  or Lee Iacocca’s history of how he saved Chrysler.

I did not grow in a rich family by any means, but mu grandfather was a very important politician with a great deal of power that made my life much different than most of the people I know.

Going back to the kids, I started to get a few answers that I was not prepared for. Maybe it is me, but a very young kid saying “I don’t want to do anything, I don’t want to work” was something that I was not expecting. At least not directly.

Houston We Have a Problem

And things kept going a bit downhill. Just like a German strategist once said “plans don’t survive contact with the enemy” or in a more empathising way as a boxer said it: “everybody has a plan until they get punched in the face”.

And these kids were on the same side as me.

So I tried to tell them about how after the teenage years, things get better. That I was a bit of a rebellious kid, but that I grew up and once I became a father I became an even better son and the world is great.

The Harsh Reality

And then one of the 12 year old girls says to me: “my sister had a kid, and she leaves the 1 year old all the time to go out to drink and take drugs”. Immediately another young kid tells me how his brother steals from time to time to buy drugs and the mother cries every night when he goes out.

At that age my main concern was getting extra lives for Mario Kart, memorizing Rivercity Ransom’s GUID like secret key and still blushed if I ran into a JCPenney underwear catalog.

Our realities were vastly different.

But don’t get me wrong. These are not bad kids and this foundation is helping them in ways that definitevely change the odds of what they will be later in life.

You don’t get to pick where you are born. But you get to pick what you do to change your future.

The Sports Star

We went back into what did they want to be when they grew up. They kept giving me answers, some of them a bit more unexpected than others. But there was one that kept coming up repeatedly: “I want to be a famous soccer player”, and they were very specific, naming their favorite – funny enough this lives when he is in Costa Rica about 200 feet away from my house. It is a great deal for many, even adults. But I don’t care about sports. If it was a famous computer programmer It would be different. Anyway…

On one hand I it was good that they had aspirations. But on the other hand I just kept thinking “a sports star”…

My Point of View

Sports star can be great. If you are a famous sports star you can make millions. They date supermodels, have fancy cars and they are lucky if they don’t end up broke and in drugs.  And their career can be short as rarely you see a 40 something sports star. Same goes for movie stars, even if you succeed you can blow up pretty easily for many reasons that I will not discuss.

But is it really going to happen? My guess is that for 99.99% of kids who dream about being a sports star, it really won’t.

Even worse, a lot of these kids will not be able to make it and they will hurt their chances of making something else of their lives.

So, what do I think they should focus instead?

My Advice

The world is full of opportunities for those who really work hard and even better who also work smart. Perhaps I am biased as I am computer programmer. I did study in a 4 year, full time, not much time for fun, sometimes absurdly tough university, but in programming this is not the only way.

And that’s one of the reasons why Pluralsight has been so successful. If you don’t know who Pluralsight is, it is the company for which both John, yours truly, and other 700 authors have created technical courses with the aim of democratizing technical training. This simply means provide an affordable way of getting access to the best technical resources out there.

And you don’t need to be a computer programming major to work as a computer programmer. As a matter of fact, a great deal of programmers even in the United States started their life working in something else. I frequently run into accountants, English majors, nurses, nutritionists and many from other professions that ended up in software development.

In my case I had to take compiler classes, assembly language, complex mathematics and statistics, but in plenty of cases all this studying does not guarantee success.

Let me tell you a few things that might help.

There is a Way

First of all, don’t think that I am telling you to skip a formal education. If you have the chance to go to university, by all means do it as it is a great base.

But if you can’t, there are things you can do. Life is full of opportunities for those willing to go after them.  And that’s where many fail. Many want to have, but don’t don’t want to work for it.

First Step of the Plan

“Find your passion” is repeated over and over again. But it is not an absolute must. You can start with something you like or that you are good at and don’t mind doing it.

And then get damn good at it. My example is as a software developer, but it could also be as a designer or related field. Practice makes perfect.

And try to get real world experience as quickly as possible. Keep grinding forward. There is a phrase I always tell myself when I want to achieve an objective.

“Be disciplined, systematic and constant”

So you hone your skills, get real world experience and you will most likely get a job.

As an example, right now if you are very good with JavaScript – even better and specialized on AngularJS, Nodejs, REACT, or similar – let me tell you that it will be really really hard to be unemployed. You will walk by a technical event and leave accidentally employed.

And now you will be able to have a decent paying, nice job that will allow you to live decently.

But I Don’t Want a Job

That’s right. But this is me. I had a steady job circa 2004. I was touring the world on an all expenses paid trip teaching Microsoft customers how to be in the latest 64 bit hardware. All was great!

But I decided to leave a steady paycheck and fail as an entrepreneur while young.

A steady job is good, but what if you wanted to have freedom and be able to go past the glass ceiling?

If you don’t know what the glass ceiling is, it means that when you grow within a company you get to a point where you can’t move any higher. There are no promotions for you. You can sometimes tell this happened to someone when they move horizontally within an organization. And their motivation may not be the highest.

And so you can work your way into entrepreneurship. I am not going to tell you how, as this is something John covers pretty well. He calls it “separate yourself from the pack”.

But once you moved from you job to entrepreneurship, doing something you love, then you will be in a great position. Because you will…

Never Work a Day in Your Life Again

If you truly love what you do, you do it well and it provides economically to sustain you, then life takes a new meaning. You will find yourself working harder and pushing yourself because you thoroughly enjoy what you do.

That’s how I feel while I work play on some of my projects and it feels amazing!

In Summary

  • Something that at face value seems simple, may have a surprise or two in store for you
  • Things may not work out how you planned them
  • And this is ok. Just be prepared
  • Some people are born with an advantage in life, but that does not define what your future will be
  • It is up to you to create your future
  • Work hard, move ahead, take opportunities as they present themselves
  • Always strive to be better… But don’t compare yourself. Try to be a better you

 

And that’s my story and piece of advice that I can share with you. I hope the best is in store for you.

Monitoring a Solr Cluster

A couple of days ago I got asked, how do we monitor our cluster? Well, there are professional ways and other for the budget conscious deployment. Here are a few options that came to my mind:

  • You have the ping request handler which can be used to determine if a node is up and running – this is useful if you want to configure the load balancer to determine which nodes are responding
  • Additionally I’ve seen environments where a monitoring service uses several predefined queries that are issued at a predefined interval and will notify if no response is received. Something like http://www.site24x7.com/ but behind the firewall. I do not know which/if monitoring services you might have.
  • And there are more specialized tools, for example Sematext although some of them are more Linux friendly, so it is necessary to look for Windows counterparts if you don’t have Linux.
  • Also you can use the clusterstate.json (this would be the one from prod https:///solr/zookeeper?detail=true&path=/clusterstate.json) from Zookeepr which will tell you the state of the nodes. You just need to do a bit of parsing which can be done pretty easily with a bit of Json.Net which is easy to learn.
  • And regarding monitoring your cluster’s use in terms of queries done, you can definitively use the Solr logs and analyze queries.

Just a few thoughts I wanted to share.

How to Query for a Null or Empty Value in SOLR

I had to look for empty values in a mandatory field in SOLR today. Wait, what? Shouldn’t mandatory values in the index should be marked as required=”true” when you are defining the field?

Well yes, but some people forget to do it or maybe the spec was not fully completed at the time when they worked on the schema so they did not include it… just in case! (YAGNI definitively comes to mind)

Well, in any case I had to find which documents did not have the publication date (which sounds like a really really really mandatory field to me).

So how do you identify them?

Option A: Query *:* and start paginating taking down notes of which documents do not have the value… Ok this is totally brute force approach. But I wouldn’t be too impressed if I find someone doing it. The things I have seen…

Option B: Query *:* and in your fl include only id and publicationdate. Paginate or add enough rows. Very amateur but a bit better than before

Option C: Query *:*, include only the two fields in fl and sort asc! Much better as in your results you will have the ones with empty at the beginning.

Option Winner: or instead of *:* simply use the nice flexibility of a specific query and use q=-publicationdate:*

This is definitively the best approach and as I just demonstrated, there are many ways of finding a solution which go from biggest effort to most effective. I strive to do things always as efficiently as possible and so should you!

 

How to Send Solr Optimize Command

There are times when you want to optimize your Solr index. But what is optimize and why do I care?

Optimize is similar to when you defragment your hard drive. Solr will create a new index removing any deleted documents. It is simply house keeping at its best.

I usually do a commit from the Admin UI, going to the overview tab. Solr Optimize

 

However, sometimes we might want to do it programatically, a good example being when you have a spell checker configured to build the dictionary on optimize. The url to optimize is very simple, here is an example with my localhost, just replace with your Solr

http://localhost:8983/solr/yourcore/update?stream.body=<optimize><query>*:*</query></optimize>

 

Notice how the # is removed from all REST calls vs when the Admin UI loads.

Happy optimizing!

Easy way to do a Solr Core Reload

Many times have I stopped and restarted Solr to reload a core, yes it is kind of a rookie way as you can always go to the Admin UI, Core Admin and reload Core.

But what if you wanted to have a really fast way of reloading your core?

Just do it via the admin handler!
http://{SOLR IP}:{SOLR PORT}/solr/admin/cores?action=RELOAD&core={CORE NAME}

You can even add it to your code and make a simple call or better yet use SolrNet via the admin functionality found below:

https://github.com/mausch/SolrNet/blob/master/Documentation/Core-admin.md

 

Easiest Way of Issuing Solr Commit Command

There are a couple of ways to trigger a commit command in Solr. The easiest way is via a URL:

http://localhost:8983/solr/collection1/update?commit=true

(Replace localhost:8983 with your Solr url and )

But you can also commit using the Documents option from the Admin UI. Simply navigate to Documents, using this URL:

http://localhost:8983/solr/collection1/documents

And select Solr Command (raw XML orJSON), adding the command

<commit>
true
</commit>

Solr Command

 

 

 

 

 

 

Submit Document! It just works. And if you are using SolrCloud, the command goes to everyone.

Plugin from the creators ofBrindes Personalizados :: More at PlulzWordpress Plugins