SolrNet Release 0.5.1: Fix Spellcheck Parser Issue

by Xavier Comments: 0

SolrNet, the C# client for Apache Solr, has a new release: 0.5.1. The current release aims to include a breaking change with the latest versions of Solr 4.x in which multiple collations are returned by Solr. I am currently working on getting it to Nuget

This is the release:

SolrNet Release 0.5.1: Fix Spellcheck Parser Issue

Let me show you quickly with an example, here is how a single collation was returned before:
<response>
<result numFound=”1″ start=”0″>
<doc>
<str name=”Key”>224fbdc1-12df-4520-9fbe-dd91f916eba1</str>
</doc>
</result>
<lst name=”spellcheck”>
<lst name=”suggestions”>
<lst name=”hell”>
<int name=”numFound”>1</int>
<int name=”startOffset”>0</int>
<int name=”endOffset”>4</int>
<arr name=”suggestion”>
<str>dell</str>
</arr>
</lst>
<lst name=”ultrashar”>
<int name=”numFound”>1</int>
<int name=”startOffset”>5</int>
<int name=”endOffset”>14</int>
<arr name=”suggestion”>
<str>ultrasharp</str>
</arr>
</lst>
<str name=”collation”>dell ultrasharp</str>
</lst>
</lst>
</response>

And then with later versions of Solr 4.x multiple collations were returned:

<response>
 <result name="response" numFound="0" start="0"></result>
 <lst name="spellcheck">
 <lst name="suggestions">
 <lst name="produtc">
 <int name="numFound">1</int>
 <int name="startOffset">0</int>
 <int name="endOffset">7</int>
 <arr name="suggestion">
 <str>product</str>
 </arr>
 </lst>
 <lst name="collation">
 <str name="collationQuery">product</str>
 <int name="hits">1000</int>
 <lst name="misspellingsAndCorrections">
 <str name="produtc">product</str>
 </lst>
 </lst>
 </lst>
 </lst>
</response>

The detail was that SolrNet would raise an issue because the numFound node was not found. Well, this issue was fixed now.

This is just the first release that I do on SolrNet since I was granted permission to provide new releases. I am merely getting up to speed to work my way through the backlog of improvements and including support for newer releases of Solr.

If you have any questions, don’t hesitate to contact me via this blog or @xmorera in Twitter.

 

The Art of Creating Applications That Have Search

by Xavier Comments: 0

In my Pluralsight trainings, Getting Started with Enterprise Search using Apache Solr and Implementing Search in .NET Applications, one of the things that I make quite a bit of emphasis is on how important search is, yet it is one of the most misunderstood functions of IT and development in general. In this post I will show you an example of how a potentially good app is a pretty bad app mainly because of its search capabilities.

It is so much the case that in Twitter Pluralsight selected this phrase to tweet about the release of my course as you can see here:

searchiseverywhere

But now let’s get to the sample. Here’s the scenario:

Problem: Life is busy. No time to go to the supermarket

Solution: use your grocery store’s web site to purchase your food and it gets delivered home the next day. Charming idea, did not work with Webvan, but it seems to be doing quite well for Amazon and in my home town one of the major supermarkets is doing it in a more controlled way with a good delivery service, all for $10. Not too scalable, but for a MVP it is ok. (Read Lean Startup if you don’t know what MVP is)

It may work or maybe not mainly because of a really bad user experience, but let me get to the point. UX is important! Never forget it!

You get to the app in https://www.automercado.co.cr/aam/showMain.do and they have mainly 4 sections as you can see here

auto

And here is what they are for:
– On the left they have a directory style organized by aisle. Grouping kind of works in my opinion if you are not too sure of what you want, but it is terribly slow and inefficient. They lose cookie points for this.

2014-07-02_0638

– Then in the middle they have a section where they display the products. This is very standard so it kind of goes through, however they lose cookie points again for having products without pictures or with very weird stretching. They are a supermarket, and a big one, so I am sure they can send a guy with an iPhone to take a quick picture.

2014-07-02_0637

– The cart has a problem which is that they do not actually display the product name, only the description. Who thought of this? Not even something as simple as a tooltip!

2014-07-02_0640

And then here is the deal breaker for me: BAD SEARCH! As mentioned in the post, search is one of the most misunderstood functionalities in IT. A lot of people make huge mistakes because search can be done with a database, which it can, but the end results sucks! And it did suck here.

Let me show you this. I want to look for “jabon dial” which means “Dial Soap”. So I just type “Jabon Dial”. Should work, right? It doesn’t! Look at the message: “No results found…”. Also I hate the CAPS. There may be 1 technical reason I can think of but it is pretty dumb.

2014-07-02_0646

But why? If you look closely there are 27 types of “Jabon Dial”, type only Dial

2014-07-02_0649

The problem lies here:
– The person that implemented this application had no knowledge of how search works, which is normal as search is pretty misunderstood.
– But humans don’t do search like engineers want. Having the user do a search exactly like the engineer wants is just lazy and ineffective.
– So engineers who created this probably went for a simple exact match in a database search
– This is a terrible user experience. I can bet the farm that Amazon would have closed its doors in the 1990s if they had such a bad search

How to fix it? Well, go learn how to use a search engine. And that’s why I created my course, Getting Started With Enterprise Search Using Apache Solr: http://www.pluralsight.com/training/Courses/TableOfContents/enterprise-search-using-apache-solr

Easy way to do a Solr Core Reload

by Xavier Comments: 0

Many times have I stopped and restarted Solr to reload a core, yes it is kind of a rookie way as you can always go to the Admin UI, Core Admin and reload Core.

But what if you wanted to have a really fast way of reloading your core?

Just do it via the admin handler!
http://{SOLR IP}:{SOLR PORT}/solr/admin/cores?action=RELOAD&core={CORE NAME}

You can even add it to your code and make a simple call or better yet use SolrNet via the admin functionality found below:

https://github.com/mausch/SolrNet/blob/master/Documentation/Core-admin.md

 

How to change the Solr request handler with SolrNet and ServiceLocator

by Xavier Comments: 1

I am working in a project that sounds like heaven to me. Big company, hundreds of developers, latest technology all around, totally agile and the search is done with Solr and a REST API in C# which of course uses SolrNet (who would think otherwise?)

In any case, spellcheck was enabled and this wreaked havoc whenever servers were rebooted. It seems like SolrCloud has an issue with spellchecking . The problem is that setting spellcheck in /select request handler makes Solr spin its wheels for a long time while starting, and it has been tracked inhttps://issues.apache.org/jira/browse/SOLR-6679. The recommended workaround is to have spell check set up in a different request handler.

But here is the problem. In SolrNet you can’t easily explicitly specify the request handler. It basically uses /select.  The request handler is specified via the Handler property in ISolrQueryExecuter. You can see it in action here:

https://github.com/mausch/SolrNet/blob/master/SolrNet/Impl/SolrQueryExecuter.cs

I checked through many forums and threads to try to get to a solution and here are some of the threads I found:

Changing Handler endpoint in SolrQueryExecuter? https://groups.google.com/forum/?fromgroups=#!searchin/solrnet/handler/solrnet/Kqxn68pU0uo/uG50WSxu_swJ

How can I perform solrnet query in two different request handler? https://groups.google.com/forum/#!topic/solrnet/ZA-bv9dkh_0

Different request handler https://groups.google.com/forum/#!topic/solrnet/SP14XmifcrY

Calling Custom Request Handler https://groups.google.com/forum/#!topic/solrnet/THX-ADS5CLQ

http://stackoverflow.com/questions/13393700/how-we-changes-standard-query-handler

There were many recommendations, among them:

–          Use qt, which the problem is that it is deprecated and I think will not be available in Solr 5. And there is a lot of pushback against qt.

–          Move to CastleWindsor, which is what Eduardo, a friend of mine did last week.  I don’t have enough time to do this. On a really tight schedule.

–          Replace the Handler property in SolrQueryExecuter, which is what was recommended in one of the threads

–          Or Remove() and re Register() the ISolrQueryExecuter

One of the recommendations was modify the Handler this way:

Startup.Init<T>(new SolrConnection("http://localhost:8080/solr"));

var executor = ServiceLocator.Current.GetInstance<ISolrQueryExecuter<T>>() as SolrQueryExecuter<T>;

executor.Handler = "/new";

Which did not work as it did not modify the registered instance. No idea, but if you know how to fix let me know.

So the recommendation from Mauricio Scheffer was to Remove and Reregister, which I did not know how to do but Satish, a very friendly developer, helped me. Here is the solution:

Startup.Init<MyDocument>( “http://localhost:8080/solr”);

var container = ServiceLocator.Current as SolrNet.Utils.Container;

container.Remove<ISolrQueryExecuter<MyDocument>>();

var instance = new SolrQueryExecuter<MyDocument>(container.GetInstance<ISolrAbstractResponseParser<MyDocument>>(), new SolrConnection(“http://localhost:8080/solr”), container.GetInstance<ISolrQuerySerializer>(), container.GetInstance<ISolrFacetQuerySerializer>(), container.GetInstance<ISolrMoreLikeThisHandlerQueryResultsParser<MyDocument>>());

instance.Handler = "/yourhandler";

And this saved my day!

Using Solr from C# Made Dead Easy with SolrNet

by Xavier Comments: 2

I was having a conversation with an solutions architect a few days ago about a specific Solr project with .NET. He was telling me some of the details of what they were going to build, how they were going to call Solr’s REST API for querying and ….

Stop the press… Stop right there… I said….

You are going to call Solr’s REST API? Really? The fact that Solr has a REST API is massively useful, but why do you want to reinvent the wheel.

If you want to use Solr from .NET, I explained, the best way is to use SolrNet(https://github.com/mausch/SolrNet). SolrNet is an Apache Solr client for .Net that allows for you to use Solr from .NET in a super easy and efficient way. It abstracts Solr in such an easy way that you basically just create a POCO object (Plain Old Clr Object – not to wrongly called Plain Old C# Object as many people do) and presto, you have functions to Add, Delete, Commit and more. Let me show you a small sample I built about a week ago in literally just a few minutes to index a few thousand mails from Pluralsight’s (the best online training resource available IMHO – and yes I am a bit biased here!) author distribution list.

To create this demo, where I indexed about 4k emails I did the following

– Download Solr, 4.10 at the time [takes a few minutes to download]

– Clone collection1 and rename it authorslist, don’t forget to change the collection name

– Change in schema.xml so that you have the following fields (and remember this is just a quick and dirty example that I wanted to show how provide a searchable index of all author mails inside a Wordpress site. Looks like this:

<field name=”itemid” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />
<field name=”subject” type=”text_general” indexed=”true” stored=”true”/>
<field name=”sent” type=”date” indexed=”true” stored=”true” omitNorms=”true”/>
<field name=”sendername” type=”string” indexed=”true” stored=”true” multiValued=”false”/>
<field name=”recipients” type=”string” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”body” type=”text_general” indexed=”true” stored=”true”/>
<field name=”htmlbody” type=”text_general” indexed=”false” stored=”true”/>

– Make itemid uniquekey instead of id

– Modify solrconfig.xml to add a new requesthandler, so that I can play a bit with facets and weights, nothing out of the ordinary

– And now to index the mails I just wrote a few lines of code to read from my Outlook, using the Office Primary Interop Assemblies (known as PIAs) in a console application

– Add SolrNet to the console app

– Create a POCO object to represent whatever I added to the schema. It should look like this. Please notice the attributes, which is what tells SolrNet what it is representing in Solr. This is my POCO object:

public class MailSolr {

[SolrUniqueKey(“itemid”)]
public string ItemId { get; set; }

[SolrField(“subject”)]
public string Subject { get; set; }

[SolrField(“sent”)]
public DateTime Sent { get; set; }

[SolrField(“sendername”)]
public string SenderName { get; set; }

[SolrField(“recipients”)]
public ICollection Recipients { get; set; }

[SolrField(“body”)]
public string Body { get; set; }

[SolrField(“htmlbody”)]
public string HtmlBody { get; set; }
}

– And write the following lines to index

ISolrOperations solr = GetSolr();
int it = 0;
foreach (Microsoft.Office.Interop.Outlook.MailItem i in lM)
{
it++;
MailSolr m = new MailSolr();
m.ItemId = i.EntryID;
m.Subject = i.Subject;
m.Sent = i.SentOn;
m.SenderName = i.SenderName;
m.Body = i.Body;
m.HtmlBody = i.HTMLBody;

solr.Add(m);
}
solr.Commit();

As you can see adding a document with SolrNet is dead easy. Simply connect to Solr, create a new instance of your POCO object, solr.Add() and solr.Commit(). How much easier would you want it to be?

Next Tuesday I will talk about the NIH syndrome (not invented here) around SolrNet.