How to Query for a Null or Empty Value in SOLR

by Xavier Comments: 0

I had to look for empty values in a mandatory field in SOLR today. Wait, what? Shouldn’t mandatory values in the index should be marked as required=”true” when you are defining the field?

Well yes, but some people forget to do it or maybe the spec was not fully completed at the time when they worked on the schema so they did not include it… just in case! (YAGNI definitively comes to mind)

Well, in any case I had to find which documents did not have the publication date (which sounds like a really really really mandatory field to me).

So how do you identify them?

Option A: Query *:* and start paginating taking down notes of which documents do not have the value… Ok this is totally brute force approach. But I wouldn’t be too impressed if I find someone doing it. The things I have seen…

Option B: Query *:* and in your fl include only id and publicationdate. Paginate or add enough rows. Very amateur but a bit better than before

Option C: Query *:*, include only the two fields in fl and sort asc! Much better as in your results you will have the ones with empty at the beginning.

Option Winner: or instead of *:* simply use the nice flexibility of a specific query and use q=-publicationdate:*

This is definitively the best approach and as I just demonstrated, there are many ways of finding a solution which go from biggest effort to most effective. I strive to do things always as efficiently as possible and so should you!

 

How to Send Solr Optimize Command

by Xavier Comments: 1

There are times when you want to optimize your Solr index. But what is optimize and why do I care?

Optimize is similar to when you defragment your hard drive. Solr will create a new index removing any deleted documents. It is simply house keeping at its best.

I usually do a commit from the Admin UI, going to the overview tab. Solr Optimize

 

However, sometimes we might want to do it programatically, a good example being when you have a spell checker configured to build the dictionary on optimize. The url to optimize is very simple, here is an example with my localhost, just replace with your Solr

http://localhost:8983/solr/yourcore/update?stream.body=<optimize><query>*:*</query></optimize>

 

Notice how the # is removed from all REST calls vs when the Admin UI loads.

Happy optimizing!

Easy way to do a Solr Core Reload

by Xavier Comments: 0

Many times have I stopped and restarted Solr to reload a core, yes it is kind of a rookie way as you can always go to the Admin UI, Core Admin and reload Core.

But what if you wanted to have a really fast way of reloading your core?

Just do it via the admin handler!
http://{SOLR IP}:{SOLR PORT}/solr/admin/cores?action=RELOAD&core={CORE NAME}

You can even add it to your code and make a simple call or better yet use SolrNet via the admin functionality found below:

https://github.com/mausch/SolrNet/blob/master/Documentation/Core-admin.md

 

Easiest Way of Issuing Solr Commit Command

by Xavier Comments: 0

There are a couple of ways to trigger a commit command in Solr. The easiest way is via a URL:

http://localhost:8983/solr/collection1/update?commit=true

(Replace localhost:8983 with your Solr url and )

But you can also commit using the Documents option from the Admin UI. Simply navigate to Documents, using this URL:

http://localhost:8983/solr/collection1/documents

And select Solr Command (raw XML orJSON), adding the command

<commit>
true
</commit>

Solr Command

 

 

 

 

 

 

Submit Document! It just works. And if you are using SolrCloud, the command goes to everyone.

Quick Tip: Find Missing Fields in Apache Solr

by Xavier Comments: 0

I had an issue raised because of a mismatch between my document results and my facet counts. The issue is basically that there is a field that is not required and in most cases, the field is added with an empty string – which is ok as empty has a meaning. However in a few cases, the field is not added at all and this is not the expected scenario. So I needed to find out why did this happen, which means finding the document id so that it can be reviewed during indexing.

Oh well..I was tired so I ran a *:* query and got all results… too much text.

Query for all: q=*:*

Added only the two fields that I needed in the fl field so that only those fields that I needed were shown and number of rows to see them all. This was kind of slow and inconvenient.

Query for all with only required fields and all rows: q=*:*&fl=title myfacet&rows=1600

So now I remembered query for missing fields! Just use the – operator on a field name.

Query for documents with missing fields: q=-myfacet:*

Problem solved. Easy as pie!

How to Start a Development Apache Solr

by Xavier Comments: 3

I am preparing for a presentation this month on Solr and SolrNet for the Atlanta .NET User Group.  Solr 5 is already out but I will be running my demos using Solr 4.10. Now that I am starting the preparation process, it really feels so good to know that starting a local Solr is SO EASY. Check out the steps which couldn’t be easier:

– Assuming you already downloaded Solr (here if you haven’t: http://lucene.apache.org/solr/downloads.html)

– Just extract into a folder. Mine is called AtlantaSolr

– Make sure you have Java running. If unsure just type java -version

2015-06-10_2152

– Now navigate to your Solr folder, in my case C:\Dropbox\Public Speaking\AtlantaSolrSolrNet

– Type the magic words java -jar start.jar and let it load.

– Voila! Navigate to localhost:8983/solr

2015-06-10_2155

It couldn’t be easier!

How to change the Solr request handler with SolrNet and ServiceLocator

by Xavier Comments: 1

I am working in a project that sounds like heaven to me. Big company, hundreds of developers, latest technology all around, totally agile and the search is done with Solr and a REST API in C# which of course uses SolrNet (who would think otherwise?)

In any case, spellcheck was enabled and this wreaked havoc whenever servers were rebooted. It seems like SolrCloud has an issue with spellchecking . The problem is that setting spellcheck in /select request handler makes Solr spin its wheels for a long time while starting, and it has been tracked inhttps://issues.apache.org/jira/browse/SOLR-6679. The recommended workaround is to have spell check set up in a different request handler.

But here is the problem. In SolrNet you can’t easily explicitly specify the request handler. It basically uses /select.  The request handler is specified via the Handler property in ISolrQueryExecuter. You can see it in action here:

https://github.com/mausch/SolrNet/blob/master/SolrNet/Impl/SolrQueryExecuter.cs

I checked through many forums and threads to try to get to a solution and here are some of the threads I found:

Changing Handler endpoint in SolrQueryExecuter? https://groups.google.com/forum/?fromgroups=#!searchin/solrnet/handler/solrnet/Kqxn68pU0uo/uG50WSxu_swJ

How can I perform solrnet query in two different request handler? https://groups.google.com/forum/#!topic/solrnet/ZA-bv9dkh_0

Different request handler https://groups.google.com/forum/#!topic/solrnet/SP14XmifcrY

Calling Custom Request Handler https://groups.google.com/forum/#!topic/solrnet/THX-ADS5CLQ

http://stackoverflow.com/questions/13393700/how-we-changes-standard-query-handler

There were many recommendations, among them:

–          Use qt, which the problem is that it is deprecated and I think will not be available in Solr 5. And there is a lot of pushback against qt.

–          Move to CastleWindsor, which is what Eduardo, a friend of mine did last week.  I don’t have enough time to do this. On a really tight schedule.

–          Replace the Handler property in SolrQueryExecuter, which is what was recommended in one of the threads

–          Or Remove() and re Register() the ISolrQueryExecuter

One of the recommendations was modify the Handler this way:

Startup.Init<T>(new SolrConnection("http://localhost:8080/solr"));

var executor = ServiceLocator.Current.GetInstance<ISolrQueryExecuter<T>>() as SolrQueryExecuter<T>;

executor.Handler = "/new";

Which did not work as it did not modify the registered instance. No idea, but if you know how to fix let me know.

So the recommendation from Mauricio Scheffer was to Remove and Reregister, which I did not know how to do but Satish, a very friendly developer, helped me. Here is the solution:

Startup.Init<MyDocument>( “http://localhost:8080/solr”);

var container = ServiceLocator.Current as SolrNet.Utils.Container;

container.Remove<ISolrQueryExecuter<MyDocument>>();

var instance = new SolrQueryExecuter<MyDocument>(container.GetInstance<ISolrAbstractResponseParser<MyDocument>>(), new SolrConnection(“http://localhost:8080/solr”), container.GetInstance<ISolrQuerySerializer>(), container.GetInstance<ISolrFacetQuerySerializer>(), container.GetInstance<ISolrMoreLikeThisHandlerQueryResultsParser<MyDocument>>());

instance.Handler = "/yourhandler";

And this saved my day!

Should I Rename the uniquekey in Solr?

by Xavier Comments: 1

I was asked today if it makes sense to change the name of the default id field in Solr to something else?

Let me explain. You download your Solr, you get it up and running and you start modifying fields in Schema.xml. Currently the uniquekey is called id and it is defined as uniquekey. No harm done in leaving as is. It looks something like what I have below.

<field name=”id” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />

….

<uniqueKey>id</uniqueKey>

At first it may seem like it doesn’t matter what it is actually called, however if you think about it a little bit more it may make sense to change it.

Why?

Simple. Because id, by being the default value may be used in other locations, even in sample code and it may have been used for specific purposes or in ways that you are not aware of.

So by changing it, you are forcing yourself to be aware of where the id field is being used. It is not a huge difference, but the devil is in the details.

Something as easy as changing it to look like this is more than enough.

<field name=”itemid” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />

….

<uniqueKey>itemid</uniqueKey>

Want a more precise example? Run SolrNet’s sample app and just change the name of the uniquekey. Tell me how it went!

The NIH Syndrome in Action: Calling Solr’s REST API

by Xavier Comments: 0

As I mentioned in my previous post, I was having a chat with a search architect on C# and Solr and he was telling me how they were going to call Solr using its REST API, which IMHO is not the best way to go.

I recommended SolrNet and I stick with this recommendation because you do not want to reinvent the wheel. SolrNet has been built over the course of several years, there are plenty of people using it and they have had the time to understand the functionality that Solr provides and code it accordingly. The problem that you will face if you implement everything via REST calls is that you need to take into consideration all possible scenarios, which in a lot of cases does not happen, And that’s when exceptions start to occur.

In some cases it is not possible to use a library as there may be legal implications. But in other cases it is because of the NIH syndrome, or the not invented here syndrome.

Sometimes teams prefer to have everything created in house. In my opinion that may apply pretty well if you have a closed library as you are at the mercy of the company’s support and sometimes even their ability to profit and therefore continue to exist.

But if it is open source, like SolrNet, then don’t be afraid. You are in control, you can modify the code if need be – as with any open source, of course following the licensing terms – and definitively you can benefit from the experience of many others that have traveled the same road you are currently on.

But of course, be nice. If you make an improvement, please contribute back to the SolrNet community. Then we all mutually benefit.

Using Solr from C# Made Dead Easy with SolrNet

by Xavier Comments: 2

I was having a conversation with an solutions architect a few days ago about a specific Solr project with .NET. He was telling me some of the details of what they were going to build, how they were going to call Solr’s REST API for querying and ….

Stop the press… Stop right there… I said….

You are going to call Solr’s REST API? Really? The fact that Solr has a REST API is massively useful, but why do you want to reinvent the wheel.

If you want to use Solr from .NET, I explained, the best way is to use SolrNet(https://github.com/mausch/SolrNet). SolrNet is an Apache Solr client for .Net that allows for you to use Solr from .NET in a super easy and efficient way. It abstracts Solr in such an easy way that you basically just create a POCO object (Plain Old Clr Object – not to wrongly called Plain Old C# Object as many people do) and presto, you have functions to Add, Delete, Commit and more. Let me show you a small sample I built about a week ago in literally just a few minutes to index a few thousand mails from Pluralsight’s (the best online training resource available IMHO – and yes I am a bit biased here!) author distribution list.

To create this demo, where I indexed about 4k emails I did the following

– Download Solr, 4.10 at the time [takes a few minutes to download]

– Clone collection1 and rename it authorslist, don’t forget to change the collection name

– Change in schema.xml so that you have the following fields (and remember this is just a quick and dirty example that I wanted to show how provide a searchable index of all author mails inside a Wordpress site. Looks like this:

<field name=”itemid” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />
<field name=”subject” type=”text_general” indexed=”true” stored=”true”/>
<field name=”sent” type=”date” indexed=”true” stored=”true” omitNorms=”true”/>
<field name=”sendername” type=”string” indexed=”true” stored=”true” multiValued=”false”/>
<field name=”recipients” type=”string” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”body” type=”text_general” indexed=”true” stored=”true”/>
<field name=”htmlbody” type=”text_general” indexed=”false” stored=”true”/>

– Make itemid uniquekey instead of id

– Modify solrconfig.xml to add a new requesthandler, so that I can play a bit with facets and weights, nothing out of the ordinary

– And now to index the mails I just wrote a few lines of code to read from my Outlook, using the Office Primary Interop Assemblies (known as PIAs) in a console application

– Add SolrNet to the console app

– Create a POCO object to represent whatever I added to the schema. It should look like this. Please notice the attributes, which is what tells SolrNet what it is representing in Solr. This is my POCO object:

public class MailSolr {

[SolrUniqueKey(“itemid”)]
public string ItemId { get; set; }

[SolrField(“subject”)]
public string Subject { get; set; }

[SolrField(“sent”)]
public DateTime Sent { get; set; }

[SolrField(“sendername”)]
public string SenderName { get; set; }

[SolrField(“recipients”)]
public ICollection Recipients { get; set; }

[SolrField(“body”)]
public string Body { get; set; }

[SolrField(“htmlbody”)]
public string HtmlBody { get; set; }
}

– And write the following lines to index

ISolrOperations solr = GetSolr();
int it = 0;
foreach (Microsoft.Office.Interop.Outlook.MailItem i in lM)
{
it++;
MailSolr m = new MailSolr();
m.ItemId = i.EntryID;
m.Subject = i.Subject;
m.Sent = i.SentOn;
m.SenderName = i.SenderName;
m.Body = i.Body;
m.HtmlBody = i.HTMLBody;

solr.Add(m);
}
solr.Commit();

As you can see adding a document with SolrNet is dead easy. Simply connect to Solr, create a new instance of your POCO object, solr.Add() and solr.Commit(). How much easier would you want it to be?

Next Tuesday I will talk about the NIH syndrome (not invented here) around SolrNet.