How to change the Solr request handler with SolrNet and ServiceLocator

by Xavier Comments: 1

I am working in a project that sounds like heaven to me. Big company, hundreds of developers, latest technology all around, totally agile and the search is done with Solr and a REST API in C# which of course uses SolrNet (who would think otherwise?)

In any case, spellcheck was enabled and this wreaked havoc whenever servers were rebooted. It seems like SolrCloud has an issue with spellchecking . The problem is that setting spellcheck in /select request handler makes Solr spin its wheels for a long time while starting, and it has been tracked inhttps://issues.apache.org/jira/browse/SOLR-6679. The recommended workaround is to have spell check set up in a different request handler.

But here is the problem. In SolrNet you can’t easily explicitly specify the request handler. It basically uses /select.  The request handler is specified via the Handler property in ISolrQueryExecuter. You can see it in action here:

https://github.com/mausch/SolrNet/blob/master/SolrNet/Impl/SolrQueryExecuter.cs

I checked through many forums and threads to try to get to a solution and here are some of the threads I found:

Changing Handler endpoint in SolrQueryExecuter? https://groups.google.com/forum/?fromgroups=#!searchin/solrnet/handler/solrnet/Kqxn68pU0uo/uG50WSxu_swJ

How can I perform solrnet query in two different request handler? https://groups.google.com/forum/#!topic/solrnet/ZA-bv9dkh_0

Different request handler https://groups.google.com/forum/#!topic/solrnet/SP14XmifcrY

Calling Custom Request Handler https://groups.google.com/forum/#!topic/solrnet/THX-ADS5CLQ

http://stackoverflow.com/questions/13393700/how-we-changes-standard-query-handler

There were many recommendations, among them:

–          Use qt, which the problem is that it is deprecated and I think will not be available in Solr 5. And there is a lot of pushback against qt.

–          Move to CastleWindsor, which is what Eduardo, a friend of mine did last week.  I don’t have enough time to do this. On a really tight schedule.

–          Replace the Handler property in SolrQueryExecuter, which is what was recommended in one of the threads

–          Or Remove() and re Register() the ISolrQueryExecuter

One of the recommendations was modify the Handler this way:

Startup.Init<T>(new SolrConnection("http://localhost:8080/solr"));

var executor = ServiceLocator.Current.GetInstance<ISolrQueryExecuter<T>>() as SolrQueryExecuter<T>;

executor.Handler = "/new";

Which did not work as it did not modify the registered instance. No idea, but if you know how to fix let me know.

So the recommendation from Mauricio Scheffer was to Remove and Reregister, which I did not know how to do but Satish, a very friendly developer, helped me. Here is the solution:

Startup.Init<MyDocument>( “http://localhost:8080/solr”);

var container = ServiceLocator.Current as SolrNet.Utils.Container;

container.Remove<ISolrQueryExecuter<MyDocument>>();

var instance = new SolrQueryExecuter<MyDocument>(container.GetInstance<ISolrAbstractResponseParser<MyDocument>>(), new SolrConnection(“http://localhost:8080/solr”), container.GetInstance<ISolrQuerySerializer>(), container.GetInstance<ISolrFacetQuerySerializer>(), container.GetInstance<ISolrMoreLikeThisHandlerQueryResultsParser<MyDocument>>());

instance.Handler = "/yourhandler";

And this saved my day!

Should I Rename the uniquekey in Solr?

by Xavier Comments: 1

I was asked today if it makes sense to change the name of the default id field in Solr to something else?

Let me explain. You download your Solr, you get it up and running and you start modifying fields in Schema.xml. Currently the uniquekey is called id and it is defined as uniquekey. No harm done in leaving as is. It looks something like what I have below.

<field name=”id” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />

….

<uniqueKey>id</uniqueKey>

At first it may seem like it doesn’t matter what it is actually called, however if you think about it a little bit more it may make sense to change it.

Why?

Simple. Because id, by being the default value may be used in other locations, even in sample code and it may have been used for specific purposes or in ways that you are not aware of.

So by changing it, you are forcing yourself to be aware of where the id field is being used. It is not a huge difference, but the devil is in the details.

Something as easy as changing it to look like this is more than enough.

<field name=”itemid” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />

….

<uniqueKey>itemid</uniqueKey>

Want a more precise example? Run SolrNet’s sample app and just change the name of the uniquekey. Tell me how it went!

Using Solr from C# Made Dead Easy with SolrNet

by Xavier Comments: 2

I was having a conversation with an solutions architect a few days ago about a specific Solr project with .NET. He was telling me some of the details of what they were going to build, how they were going to call Solr’s REST API for querying and ….

Stop the press… Stop right there… I said….

You are going to call Solr’s REST API? Really? The fact that Solr has a REST API is massively useful, but why do you want to reinvent the wheel.

If you want to use Solr from .NET, I explained, the best way is to use SolrNet(https://github.com/mausch/SolrNet). SolrNet is an Apache Solr client for .Net that allows for you to use Solr from .NET in a super easy and efficient way. It abstracts Solr in such an easy way that you basically just create a POCO object (Plain Old Clr Object – not to wrongly called Plain Old C# Object as many people do) and presto, you have functions to Add, Delete, Commit and more. Let me show you a small sample I built about a week ago in literally just a few minutes to index a few thousand mails from Pluralsight’s (the best online training resource available IMHO – and yes I am a bit biased here!) author distribution list.

To create this demo, where I indexed about 4k emails I did the following

– Download Solr, 4.10 at the time [takes a few minutes to download]

– Clone collection1 and rename it authorslist, don’t forget to change the collection name

– Change in schema.xml so that you have the following fields (and remember this is just a quick and dirty example that I wanted to show how provide a searchable index of all author mails inside a Wordpress site. Looks like this:

<field name=”itemid” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />
<field name=”subject” type=”text_general” indexed=”true” stored=”true”/>
<field name=”sent” type=”date” indexed=”true” stored=”true” omitNorms=”true”/>
<field name=”sendername” type=”string” indexed=”true” stored=”true” multiValued=”false”/>
<field name=”recipients” type=”string” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”body” type=”text_general” indexed=”true” stored=”true”/>
<field name=”htmlbody” type=”text_general” indexed=”false” stored=”true”/>

– Make itemid uniquekey instead of id

– Modify solrconfig.xml to add a new requesthandler, so that I can play a bit with facets and weights, nothing out of the ordinary

– And now to index the mails I just wrote a few lines of code to read from my Outlook, using the Office Primary Interop Assemblies (known as PIAs) in a console application

– Add SolrNet to the console app

– Create a POCO object to represent whatever I added to the schema. It should look like this. Please notice the attributes, which is what tells SolrNet what it is representing in Solr. This is my POCO object:

public class MailSolr {

[SolrUniqueKey(“itemid”)]
public string ItemId { get; set; }

[SolrField(“subject”)]
public string Subject { get; set; }

[SolrField(“sent”)]
public DateTime Sent { get; set; }

[SolrField(“sendername”)]
public string SenderName { get; set; }

[SolrField(“recipients”)]
public ICollection Recipients { get; set; }

[SolrField(“body”)]
public string Body { get; set; }

[SolrField(“htmlbody”)]
public string HtmlBody { get; set; }
}

– And write the following lines to index

ISolrOperations solr = GetSolr();
int it = 0;
foreach (Microsoft.Office.Interop.Outlook.MailItem i in lM)
{
it++;
MailSolr m = new MailSolr();
m.ItemId = i.EntryID;
m.Subject = i.Subject;
m.Sent = i.SentOn;
m.SenderName = i.SenderName;
m.Body = i.Body;
m.HtmlBody = i.HTMLBody;

solr.Add(m);
}
solr.Commit();

As you can see adding a document with SolrNet is dead easy. Simply connect to Solr, create a new instance of your POCO object, solr.Add() and solr.Commit(). How much easier would you want it to be?

Next Tuesday I will talk about the NIH syndrome (not invented here) around SolrNet.

Search is one of the most misunderstood functionalities in IT

by Xavier Comments: 0

There is a phrase I use all the time: “Search is one of the most misunderstood functionalities in IT”. And I think it is very accurate.

The problem lies in two different aspects:

  1. Developers don’t know how to use search engines. And it is ok, search engines can be hard to tune appropriately and it is a specialised niche. In some cases, there are some search engines which are awfully expensive.
  2. Developers are lazy. Let me explain this one.

Let’s say that I am setting up an application for selling cars. Potential customers always look for the same things, which are make, model, year, sort by price and so on and so forth. There is a set of meta data that is important and required to find what you are looking for. So what is the solution to this problem?

Use a database where each field is stored in a separate column and look for the fields accordingly, just like in the following image. It is a mistake or at least a UX horror. I hate database driven search, but that is just my personal opinion.

A typical database driven search input

The correct wat of doing it is by providing a single search box. How? Like this:

A proper search box

If you want to learn how, please click on the following link to my Pluralsight course to get started with enterprise search using Apache Solr!

pluralsight.com/training/courses/TableOfContents?courseName=enterprise-search-using-apache-solr

 

Installing Solr in Windows or Linux?

by Xavier Comments: 2

I have been a fan of Microsoft technologies all my life, probably because I’ve spent a lot of time working with .Net and related technologies. Eventually I became also an Apple fanboy as some people have called me.

But something that I haven’t been called a fan of is Linux. Don’t get me wrong, I think Linux is extremely important, but in my case I have not worked with it as much as I think I should have.

But now I am in a part of my life where I need to run Apache Solr in a production environment. What do I do? What comes naturally.

In a nutshell I set up a Windows machine in Amazon AWS, install Java, download Solr, java -jar start.jar, modify solrconfig.xml, modify schema.xml, turn around a few more knobs and test. Once I am happy I install Tomcat and voila, I have a single node for production. It is a small application with very few documents and a reasonable traffic, so it is all good. And besides, it is amazing how much a Solr instance in AWS can handle.

Anyway, my need keeps growing and I believe I need to set up a more resilient installation. Of course SolrCloud comes to mind, but I am thinking of how the pros install Solr.

So what do I do? Install Solr in a Linux AMI. Also, as I need monitoring now in place I set up SemaText. One downside of Windows is that at least when using SemaText, you can’t monitor on Windows, only Linux.

And there you go, that is my piece of advise. But not only from me, I’ve heard from many sources that Linux can be more performant and stable when running Apache Solr.

If you want to get more information on how to install Solr in a Linux instance, please follow the following link to the Apache Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production

Also, if you want to learn more about Getting Started with Enterprise Search with Apache Solr, please follow this link to my course on this subject:
www.pluralsight.com/courses/discussion/enterprise-search-using-apache-solr

The Importance of Networking and Good People

by Xavier Comments: 0

Being an entrepreneur is hard. I have several things at once (yes, mistake) but I am moving forward. One of the key areas where I put a good amount of effort is creating Pluralsight trainings. And one of my trainings, where I put in a huge amount of work is “Getting Started with Enterprise Search Using Apache Solr”, which takes a dev with 0 experience in Solr and a bit of .Net and in 3.4 hours teaches him or her how to build a working POC style project with Solr and a .NET MVC UI.

You can watch the training here: pluralsight.com/training/courses/TableOfContents?courseName=enterprise-search-using-apache-solr

Getting to the point, Pluralsight recently acquired CodeSchool and to celebrate they opened their library for 72 hours for free. So I announced in a couple of Linkedin groups that the course on Solr will be free for this time in case they want to take advantage of the offer.

Huge surprise did I get when I see a newsletter from Solr-Start (www.solr-start.com) announcing this. It turns out that Alexandre Rafalovitch, a well known Solr popularizer and author saw my notice and blasted off an email to his crowd.

It feels great when a good author shares your news over a newsletter! I wouldn’t even asked him to do this but he did it on his own and for that I really have to thank him.

And by the way, if you are just getting started with Solr, his book Instant Apache Solr for Indexing Data How-to is an excellent resource that can help you understand how to index data. It has a lot of great tips and examples. I got it from amazon a while back and it has helped me greatly. 100% recommended!

You can get it here:

https://www.packtpub.com/big-data-and-business-intelligence/instant-apache-solr-indexing-data-how-instant

Or in Amazon.com, and as you can see I bought it 1 year ago.

Free Sample + Collection Code Files       Instant Apache Solr for Indexing Data How-to

 

 

 

 

Getting Started with Enterprise Search Using Apache Solr

by Xavier Comments: 0

Enterprise search used to be not for the faint of heart or with a thin wallet. However, since the introduction of Apache Solr the name of the game has changed. Solr brings high quality enterprise search to the masses. Don’t leave home without it!

And let me help you get started! My intention is to create a series of posts where I can help you get started with Solr. This process can be easy if tackled with the appropriate resources, but it can be daunting if you chose the wrong ones.

I will start by describing what each module of my training covers, click on the bullet to be taken directly to the post.

  • Why Solr & Enterprise Search?
  • Architecture of an Enterprise Search Application
  • Solr Configuration
  • Content: Schemas, Documents and Indexing
  • Searching & Relevance
  • Making it all Work: Put a UI on It!
  • Final Words

My course is available in Pluralsight: Getting Started with Enterprise Search using Apache Sol. You can watch it here:
http://pluralsight.com/training/courses/TableOfContents?courseName=enterprise-search-using-apache-solr

Solr training in Pluralsight