Entrepreneur, Optimist, Learner, Pluralsight Author, Agile, Dev

Should I Rename the uniquekey in Solr?

I was asked today if it makes sense to change the name of the default id field in Solr to something else?

Let me explain. You download your Solr, you get it up and running and you start modifying fields in Schema.xml. Currently the uniquekey is called id and it is defined as uniquekey. No harm done in leaving as is. It looks something like what I have below.

<field name=”id” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />

….

<uniqueKey>id</uniqueKey>

At first it may seem like it doesn’t matter what it is actually called, however if you think about it a little bit more it may make sense to change it.

Why?

Simple. Because id, by being the default value may be used in other locations, even in sample code and it may have been used for specific purposes or in ways that you are not aware of.

So by changing it, you are forcing yourself to be aware of where the id field is being used. It is not a huge difference, but the devil is in the details.

Something as easy as changing it to look like this is more than enough.

<field name=”itemid” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />

….

<uniqueKey>itemid</uniqueKey>

Want a more precise example? Run SolrNet’s sample app and just change the name of the uniquekey. Tell me how it went!

The NIH Syndrome in Action: Calling Solr’s REST API

As I mentioned in my previous post, I was having a chat with a search architect on C# and Solr and he was telling me how they were going to call Solr using its REST API, which IMHO is not the best way to go.

I recommended SolrNet and I stick with this recommendation because you do not want to reinvent the wheel. SolrNet has been built over the course of several years, there are plenty of people using it and they have had the time to understand the functionality that Solr provides and code it accordingly. The problem that you will face if you implement everything via REST calls is that you need to take into consideration all possible scenarios, which in a lot of cases does not happen, And that’s when exceptions start to occur.

In some cases it is not possible to use a library as there may be legal implications. But in other cases it is because of the NIH syndrome, or the not invented here syndrome.

Sometimes teams prefer to have everything created in house. In my opinion that may apply pretty well if you have a closed library as you are at the mercy of the company’s support and sometimes even their ability to profit and therefore continue to exist.

But if it is open source, like SolrNet, then don’t be afraid. You are in control, you can modify the code if need be – as with any open source, of course following the licensing terms – and definitively you can benefit from the experience of many others that have traveled the same road you are currently on.

But of course, be nice. If you make an improvement, please contribute back to the SolrNet community. Then we all mutually benefit.

Using Solr from C# Made Dead Easy with SolrNet

I was having a conversation with an solutions architect a few days ago about a specific Solr project with .NET. He was telling me some of the details of what they were going to build, how they were going to call Solr’s REST API for querying and ….

Stop the press… Stop right there… I said….

You are going to call Solr’s REST API? Really? The fact that Solr has a REST API is massively useful, but why do you want to reinvent the wheel.

If you want to use Solr from .NET, I explained, the best way is to use SolrNet(https://github.com/mausch/SolrNet). SolrNet is an Apache Solr client for .Net that allows for you to use Solr from .NET in a super easy and efficient way. It abstracts Solr in such an easy way that you basically just create a POCO object (Plain Old Clr Object – not to wrongly called Plain Old C# Object as many people do) and presto, you have functions to Add, Delete, Commit and more. Let me show you a small sample I built about a week ago in literally just a few minutes to index a few thousand mails from Pluralsight’s (the best online training resource available IMHO – and yes I am a bit biased here!) author distribution list.

To create this demo, where I indexed about 4k emails I did the following

– Download Solr, 4.10 at the time [takes a few minutes to download]

– Clone collection1 and rename it authorslist, don’t forget to change the collection name

– Change in schema.xml so that you have the following fields (and remember this is just a quick and dirty example that I wanted to show how provide a searchable index of all author mails inside a Wordpress site. Looks like this:

<field name=”itemid” type=”string” indexed=”true” stored=”true” required=”true” multiValued=”false” />
<field name=”subject” type=”text_general” indexed=”true” stored=”true”/>
<field name=”sent” type=”date” indexed=”true” stored=”true” omitNorms=”true”/>
<field name=”sendername” type=”string” indexed=”true” stored=”true” multiValued=”false”/>
<field name=”recipients” type=”string” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”body” type=”text_general” indexed=”true” stored=”true”/>
<field name=”htmlbody” type=”text_general” indexed=”false” stored=”true”/>

– Make itemid uniquekey instead of id

– Modify solrconfig.xml to add a new requesthandler, so that I can play a bit with facets and weights, nothing out of the ordinary

– And now to index the mails I just wrote a few lines of code to read from my Outlook, using the Office Primary Interop Assemblies (known as PIAs) in a console application

– Add SolrNet to the console app

– Create a POCO object to represent whatever I added to the schema. It should look like this. Please notice the attributes, which is what tells SolrNet what it is representing in Solr. This is my POCO object:

public class MailSolr {

[SolrUniqueKey(“itemid”)]
public string ItemId { get; set; }

[SolrField(“subject”)]
public string Subject { get; set; }

[SolrField(“sent”)]
public DateTime Sent { get; set; }

[SolrField(“sendername”)]
public string SenderName { get; set; }

[SolrField(“recipients”)]
public ICollection Recipients { get; set; }

[SolrField(“body”)]
public string Body { get; set; }

[SolrField(“htmlbody”)]
public string HtmlBody { get; set; }
}

– And write the following lines to index

ISolrOperations solr = GetSolr();
int it = 0;
foreach (Microsoft.Office.Interop.Outlook.MailItem i in lM)
{
it++;
MailSolr m = new MailSolr();
m.ItemId = i.EntryID;
m.Subject = i.Subject;
m.Sent = i.SentOn;
m.SenderName = i.SenderName;
m.Body = i.Body;
m.HtmlBody = i.HTMLBody;

solr.Add(m);
}
solr.Commit();

As you can see adding a document with SolrNet is dead easy. Simply connect to Solr, create a new instance of your POCO object, solr.Add() and solr.Commit(). How much easier would you want it to be?

Next Tuesday I will talk about the NIH syndrome (not invented here) around SolrNet.

Five ways meetings suck and how to make them rock!

Meetings are a double edge sword. On one hand they are very useful for brainstorming, communication within and between teams and in general when used appropriately it sparks collaboration; moving projects forward.

The problem lies when meetings are misused or abused which sadly sometimes tends to happen more often than not. And that’s why I want to tell you 5 reasons why meetings suck and how to make them rock!

#1 People use them to keep busy vs being productive: I’ve seen many times how people set up meetings because for them going to the meeting is the work. Even worse, a lot of meetings end with “let’s set up another meeting to continue the discussion”. This is very common among Project Managers or Product Managers. Just as a fun mental exercise, take a couple of minutes and think if you know a few. I do.

How to deal with it: Make sure that every meeting that you attend or control over has a real and very clear objective. Even more, make sure that it is important that this meeting takes place. If the objective is not important or required, then simply defer it until it is the right time.

#2 A 1 hour meeting is not a 1 hour meeting: Have you ever been in a 1 hour meeting with 15 other folks and wondered how much work you actually have to get done instead of just sit there waiting for the 1 hour mark or your 2 minute turn to talk. Also, they guy next to you seems to be thinking the same while playing Candy Crush or in the best case answering emails? Well it is much worse than that. A 1 hour meeting with 16 people is actually 2 man days of work lost forever, gone in time. I shiver just to think of how many dollars are wasted because of overcrowded meetings!

How to deal with it: Not everyone needs to be invited to every meeting. Make sure each person has a clear and known reason of why they are there and (friendly) kick out of the room anyone that doesn’t. Also, as the old adage says, “divide and conquer”. A very broad topic where too many people are involved can be broken down into smaller chunks with smaller groups. You only need one person to coordinate among the teams, and 1 is always much better than a committee. This is very common with Scrums that get abused and end up looking more like office parties. Bonus points if people bring food as that distracts even further!

#3 Meetings without agenda: Another reason why some meetings take much longer than expected is that there isn’t a clear agenda and objective. This leads to endless talking, going on tangents and again scheduling new meetings to continue the conversation. People also confuse a meeting agenda with just the subject of the meeting. For example, “Discussion on data” is not clear, it is broad and subjective. This scenario goes hand in hand when overall project vision is not very clear or there are many unknowns.

How to deal with it: Create a clear agenda with well defined points and if possible, use my favorite feature of Scrum: timebox. Parkinson’s law is the adage that “work expands so as to fill the time available for its completion” and this applies very well to meetings. If you don’t have a limit, people will use all time possible for a discussion. Timebox in a reasonable way and you may see wonderful results. And if the issue is overall project vision, try planning shorter term until there is a clearer view of the road ahead.

#4 Interruptions break flow for makers: If you work hard on improving yourself you are probably familiar with the concept of “flow”, but if you are not then interiorize this: “flow is the mental state of operation in which a person performing an activity is fully immersed in a feeling of energized focus, full involvement, and enjoyment in the process of the activity”. This is when you get a lot done. Developers usually experience this without knowing when “time flies”. But in reality what happened was that the brain was fully focused and engaged in a task with potentially wonderful results.

There is something else that you should be familiar with that is the “Maker’s Schedule, Manager’s Schedule”. Now let me explain. If you are a maker, i.e. programmer, you work most likely in large chunks of time, for example from 8 am to noon. During that time you focus on solving a problem and if you are interrupted, it may take up to 15 minutes to pick up where you left off. This means that if there are multiple meetings throughout the day, your maker’s schedule gets broken down into many small chunks that do not allow for focused full involvement into a task. I worked on multiple projects in Microsoft’s main campus in Seattle and it seemed like everyone in this particular division worked in 1 hour intervals which wreaked havoc on development work. Culprit: too many (project) managers in the same building. Real development took place somewhere else.

How to deal with the managers’s schedule:

Manager’s usually have smaller chunks of work that can be broken down into 1 hour slots, hence their impulse to “book meetings with the team to catch up”. They may have the best intentions, but this usually has a negative impact. Even worse, managers are higher in the food chain so let me ask you this, when was the last time that you simply cancelled a meeting with your boss because he was interrupting you?

How to deal with the maker’s schedule: The first step is for the manager to understand the impact he is having on his whole team with constant interruptions. This sometimes requires the maker to approach the manager and explain the implications of the constant interruptions. Then the manager should aim to schedule (required) meetings when they cause the least impact, which can be near start, noon or end of day. Also, use asynchronous communication. This sounds complex, but it is basically email, chat or even better, something like Jira to manage the makers progress.

#5 Another chance for talkers to take the stage, not doers: This is a very tricky one as it requires a leader that is aware of the capabilities and responsibilities of the team members attending the meeting. The problem usually lies in that human personalities are extremely different so some people tend to talk a lot while others prefer silence. And talking is not directly proportional to doing. Someone might talk a lot and do very little and the actual guy doing the work will just sit there and listen. This creates then an unbalanced and biased view as not all of the involved parties will communicate appropriately and the outcome of the meeting might not be the best.

How to deal with it: It is the meeting organizer’s responsibility to understand what is the involvement of each one of the parties invited to the meeting in respect to the meeting agenda. Then ask questions specifically on each one of the topics to get the correct information and help obtain the best possible outcome for the meeting.

In summary meetings can help your team leap forward. Just remember to schedule meetings only when absolutely required instead of as a way to keep busy, inviting only those that are definitively required, with a very clear agenda and objective, at times where they don’t interrupt the maker’s schedule and guiding the meeting to make sure that everyone contributes their share.

Any other recommendations?

Search is one of the most misunderstood functionalities in IT

There is a phrase I use all the time: “Search is one of the most misunderstood functionalities in IT”. And I think it is very accurate.

The problem lies in two different aspects:

  1. Developers don’t know how to use search engines. And it is ok, search engines can be hard to tune appropriately and it is a specialised niche. In some cases, there are some search engines which are awfully expensive.
  2. Developers are lazy. Let me explain this one.

Let’s say that I am setting up an application for selling cars. Potential customers always look for the same things, which are make, model, year, sort by price and so on and so forth. There is a set of meta data that is important and required to find what you are looking for. So what is the solution to this problem?

Use a database where each field is stored in a separate column and look for the fields accordingly, just like in the following image. It is a mistake or at least a UX horror. I hate database driven search, but that is just my personal opinion.

A typical database driven search input

The correct wat of doing it is by providing a single search box. How? Like this:

A proper search box

If you want to learn how, please click on the following link to my Pluralsight course to get started with enterprise search using Apache Solr!

pluralsight.com/training/courses/TableOfContents?courseName=enterprise-search-using-apache-solr

 

Installing Solr in Windows or Linux?

I have been a fan of Microsoft technologies all my life, probably because I’ve spent a lot of time working with .Net and related technologies. Eventually I became also an Apple fanboy as some people have called me.

But something that I haven’t been called a fan of is Linux. Don’t get me wrong, I think Linux is extremely important, but in my case I have not worked with it as much as I think I should have.

But now I am in a part of my life where I need to run Apache Solr in a production environment. What do I do? What comes naturally.

In a nutshell I set up a Windows machine in Amazon AWS, install Java, download Solr, java -jar start.jar, modify solrconfig.xml, modify schema.xml, turn around a few more knobs and test. Once I am happy I install Tomcat and voila, I have a single node for production. It is a small application with very few documents and a reasonable traffic, so it is all good. And besides, it is amazing how much a Solr instance in AWS can handle.

Anyway, my need keeps growing and I believe I need to set up a more resilient installation. Of course SolrCloud comes to mind, but I am thinking of how the pros install Solr.

So what do I do? Install Solr in a Linux AMI. Also, as I need monitoring now in place I set up SemaText. One downside of Windows is that at least when using SemaText, you can’t monitor on Windows, only Linux.

And there you go, that is my piece of advise. But not only from me, I’ve heard from many sources that Linux can be more performant and stable when running Apache Solr.

If you want to get more information on how to install Solr in a Linux instance, please follow the following link to the Apache Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production

Also, if you want to learn more about Getting Started with Enterprise Search with Apache Solr, please follow this link to my course on this subject:
www.pluralsight.com/courses/discussion/enterprise-search-using-apache-solr

The Importance of Networking and Good People

Being an entrepreneur is hard. I have several things at once (yes, mistake) but I am moving forward. One of the key areas where I put a good amount of effort is creating Pluralsight trainings. And one of my trainings, where I put in a huge amount of work is “Getting Started with Enterprise Search Using Apache Solr”, which takes a dev with 0 experience in Solr and a bit of .Net and in 3.4 hours teaches him or her how to build a working POC style project with Solr and a .NET MVC UI.

You can watch the training here: pluralsight.com/training/courses/TableOfContents?courseName=enterprise-search-using-apache-solr

Getting to the point, Pluralsight recently acquired CodeSchool and to celebrate they opened their library for 72 hours for free. So I announced in a couple of Linkedin groups that the course on Solr will be free for this time in case they want to take advantage of the offer.

Huge surprise did I get when I see a newsletter from Solr-Start (www.solr-start.com) announcing this. It turns out that Alexandre Rafalovitch, a well known Solr popularizer and author saw my notice and blasted off an email to his crowd.

It feels great when a good author shares your news over a newsletter! I wouldn’t even asked him to do this but he did it on his own and for that I really have to thank him.

And by the way, if you are just getting started with Solr, his book Instant Apache Solr for Indexing Data How-to is an excellent resource that can help you understand how to index data. It has a lot of great tips and examples. I got it from amazon a while back and it has helped me greatly. 100% recommended!

You can get it here:

https://www.packtpub.com/big-data-and-business-intelligence/instant-apache-solr-indexing-data-how-instant

Or in Amazon.com, and as you can see I bought it 1 year ago.

Free Sample + Collection Code Files       Instant Apache Solr for Indexing Data How-to

 

 

 

 

Find out who is connecting to your database – lovely query!

So I needed to figure out which servers are connecting to which databases. That sounds like a complicated thing but it isn’t!

Just run this query (and make sure you have appropriate permissions)

SELECT
loginame, hostname, program_name, DB_Name(dbid), last_batch
FROM
sys.sysprocesses
where hostname <> ”
order by last_batch desc

Atlassian Summit Presentation Video – Collaboration is More than Communication

Collaboration is defined as “the action of working with someone to produce or create something.” Yet, many confuse communicating with collaborating. True collaboration gives you and your project an edge by aligning efforts towards a clear objective. I’ll show how teams can achieve true collaboration with JIRA Agile.

And here is my presentation from the Atlassian Summit 2014 on this topic


You can view in the Atlassian archives https://summit.atlassian.com/archives/2014/collaboration-teams/collaboration-is-more-than-communication-jira-agile

Hope you enjoy!

Best Practice and Development Tip: Don’t Reinvent the Wheel in C#

This is just a quick tip and development best practice based on a few things I’ve found while fixing bugs in an application. It is not just a quick tip on how to get the extension of a file, but instead it is about not reinventing the wheel, thinking about all possibilities and outcomes when you are programming and in general doing things right.

The idea is that whenever you have a problem to solve, for example get the extension for a given file you should find the appropriate framework function instead of trying to solve it on your own. Someone definitively already spent a lot of time creating a function that tests many potential scenarios.

Here is what I found:

Don't reinvent the wheel

What is the problem? That for any file that is included with multiple “.” Then as you can see the extension is extracted incorrectly.

How should I handle this? Welll, if you are wondering “oh look for the first “.” but from right to left!”

Hmmmm yes…maybe… but no sale.

Instead you should use the appropriate framework libraries. Read this: http://msdn.microsoft.com/en-us/library/system.io.path.getextension(v=vs.110).aspx

Path.GetExtension Method

.NET Framework 4.5

7 out of 10 rated this helpful – Rate this topic

Returns the extension of the specified path string.

Namespace:  System.IO
Assembly:  mscorlib (in mscorlib.dll)

Syntax

C#

public static string GetExtension(        string path)

Parameters

path

Type: System.String

The path string from which to get the extension.

Return Value

Type: System.String
The extension of the specified path (including the period “.”), or null, or String.Empty. If path is null, GetExtension returns null. If path does not have extension information, GetExtension returns String.Empty.

And try to do the same always. Think about all possibilities and when possible try to find out if you are not reinventing the wheel.

Plugin from the creators ofBrindes Personalizados :: More at PlulzWordpress Plugins