Welcome to Big Data TV – Or The One That Started It All

by Xavier Comments: 0

Hello and welcome, I am Xavier Morera and I am very passionate about helping developers understand enterprise search and Big Data.

And today, I welcome you to the first post of the Big Data Inc Series (which will soon be joined with Big Data TV).

So, you might be wondering… what is the Big Data Inc Series?  Easy. It is a series of bite size posts that explain enterprise search and Big Data.

What is my objective? At a high level, each post will take between 5 to 7 minutes, and will provide an overview of one particular topic – and only one – to give you enough information to understand what is the purpose of a particular platform, language, project or anything else that touches enterprise search and Big Data

Why am I doing this? First of all, I am really passionate about search and Big Data… like a kid on Christmas day. I do have to agree that I have my preferred platforms, languages, and projects.  However, it does not hurt to have an idea of what each one is about.

Also, why are the posts so short? Well, I could go on and on for hours – believe me, or at least my friends who say that a 45 minute presentation for me is just like warming up – but the point is that I want to be very concise, straight to the point, and give you an overall idea. The Big Data Series is not meant to be tutorials. For trainings I have several courses at Pluralsight which include topics like Spark, Cloudera CDH, Solr, Hue, Hive, JSON, code profiling and more – as well as having done and helped on trainings for Cloudera, Microsoft/HP/Intel.

I will cover a topic, give you a general idea, and let you decide if this is a technology that could be useful in your toolbelt. In many cases, I will point you in the direction of where to go learn more or I will tell you a story or two of how these technologies are used in real life.

So please join me on this journey with the Big Data Series. In our next post, we will talk about how Big Data started, with Hadoop. Also don’t forget to subscribe to be notified of new released posts, videos, like and share. Also, you can follow the links below in the description.

And as we Costa Ricans say, pura vida!

 

The Art of Creating Applications That Have Search

by Xavier Comments: 0

In my Pluralsight trainings, Getting Started with Enterprise Search using Apache Solr and Implementing Search in .NET Applications, one of the things that I make quite a bit of emphasis is on how important search is, yet it is one of the most misunderstood functions of IT and development in general. In this post I will show you an example of how a potentially good app is a pretty bad app mainly because of its search capabilities.

It is so much the case that in Twitter Pluralsight selected this phrase to tweet about the release of my course as you can see here:

searchiseverywhere

But now let’s get to the sample. Here’s the scenario:

Problem: Life is busy. No time to go to the supermarket

Solution: use your grocery store’s web site to purchase your food and it gets delivered home the next day. Charming idea, did not work with Webvan, but it seems to be doing quite well for Amazon and in my home town one of the major supermarkets is doing it in a more controlled way with a good delivery service, all for $10. Not too scalable, but for a MVP it is ok. (Read Lean Startup if you don’t know what MVP is)

It may work or maybe not mainly because of a really bad user experience, but let me get to the point. UX is important! Never forget it!

You get to the app in https://www.automercado.co.cr/aam/showMain.do and they have mainly 4 sections as you can see here

auto

And here is what they are for:
– On the left they have a directory style organized by aisle. Grouping kind of works in my opinion if you are not too sure of what you want, but it is terribly slow and inefficient. They lose cookie points for this.

2014-07-02_0638

– Then in the middle they have a section where they display the products. This is very standard so it kind of goes through, however they lose cookie points again for having products without pictures or with very weird stretching. They are a supermarket, and a big one, so I am sure they can send a guy with an iPhone to take a quick picture.

2014-07-02_0637

– The cart has a problem which is that they do not actually display the product name, only the description. Who thought of this? Not even something as simple as a tooltip!

2014-07-02_0640

And then here is the deal breaker for me: BAD SEARCH! As mentioned in the post, search is one of the most misunderstood functionalities in IT. A lot of people make huge mistakes because search can be done with a database, which it can, but the end results sucks! And it did suck here.

Let me show you this. I want to look for “jabon dial” which means “Dial Soap”. So I just type “Jabon Dial”. Should work, right? It doesn’t! Look at the message: “No results found…”. Also I hate the CAPS. There may be 1 technical reason I can think of but it is pretty dumb.

2014-07-02_0646

But why? If you look closely there are 27 types of “Jabon Dial”, type only Dial

2014-07-02_0649

The problem lies here:
– The person that implemented this application had no knowledge of how search works, which is normal as search is pretty misunderstood.
– But humans don’t do search like engineers want. Having the user do a search exactly like the engineer wants is just lazy and ineffective.
– So engineers who created this probably went for a simple exact match in a database search
– This is a terrible user experience. I can bet the farm that Amazon would have closed its doors in the 1990s if they had such a bad search

How to fix it? Well, go learn how to use a search engine. And that’s why I created my course, Getting Started With Enterprise Search Using Apache Solr: http://www.pluralsight.com/training/Courses/TableOfContents/enterprise-search-using-apache-solr

Stemming and Multi Language

by Xavier Comments: 0

I received a question today on stemming and multi language. Basically, “why do we need multiple fields in our Solr in different languages and how do I test multi language stemming?”.

First of all, let’s explain what stemming is. Stemming involves reducing words to their stem (or base or root) during indexing and querying in an effort to improve recall.

For example, if a document includes the following phrase “Xavier walked to work every morning from Westside Parkway” and a user searches for walk then the results will correctly include the document that has walk. Read more!

Search is one of the most misunderstood functionalities in IT

by Xavier Comments: 0

There is a phrase I use all the time: “Search is one of the most misunderstood functionalities in IT”. And I think it is very accurate.

The problem lies in two different aspects:

  1. Developers don’t know how to use search engines. And it is ok, search engines can be hard to tune appropriately and it is a specialised niche. In some cases, there are some search engines which are awfully expensive.
  2. Developers are lazy. Let me explain this one.

Let’s say that I am setting up an application for selling cars. Potential customers always look for the same things, which are make, model, year, sort by price and so on and so forth. There is a set of meta data that is important and required to find what you are looking for. So what is the solution to this problem?

Use a database where each field is stored in a separate column and look for the fields accordingly, just like in the following image. It is a mistake or at least a UX horror. I hate database driven search, but that is just my personal opinion.

A typical database driven search input

The correct wat of doing it is by providing a single search box. How? Like this:

A proper search box

If you want to learn how, please click on the following link to my Pluralsight course to get started with enterprise search using Apache Solr!

pluralsight.com/training/courses/TableOfContents?courseName=enterprise-search-using-apache-solr

 

Installing Solr in Windows or Linux?

by Xavier Comments: 2

I have been a fan of Microsoft technologies all my life, probably because I’ve spent a lot of time working with .Net and related technologies. Eventually I became also an Apple fanboy as some people have called me.

But something that I haven’t been called a fan of is Linux. Don’t get me wrong, I think Linux is extremely important, but in my case I have not worked with it as much as I think I should have.

But now I am in a part of my life where I need to run Apache Solr in a production environment. What do I do? What comes naturally.

In a nutshell I set up a Windows machine in Amazon AWS, install Java, download Solr, java -jar start.jar, modify solrconfig.xml, modify schema.xml, turn around a few more knobs and test. Once I am happy I install Tomcat and voila, I have a single node for production. It is a small application with very few documents and a reasonable traffic, so it is all good. And besides, it is amazing how much a Solr instance in AWS can handle.

Anyway, my need keeps growing and I believe I need to set up a more resilient installation. Of course SolrCloud comes to mind, but I am thinking of how the pros install Solr.

So what do I do? Install Solr in a Linux AMI. Also, as I need monitoring now in place I set up SemaText. One downside of Windows is that at least when using SemaText, you can’t monitor on Windows, only Linux.

And there you go, that is my piece of advise. But not only from me, I’ve heard from many sources that Linux can be more performant and stable when running Apache Solr.

If you want to get more information on how to install Solr in a Linux instance, please follow the following link to the Apache Solr Reference Guide: https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production

Also, if you want to learn more about Getting Started with Enterprise Search with Apache Solr, please follow this link to my course on this subject:
www.pluralsight.com/courses/discussion/enterprise-search-using-apache-solr

The Importance of Networking and Good People

by Xavier Comments: 0

Being an entrepreneur is hard. I have several things at once (yes, mistake) but I am moving forward. One of the key areas where I put a good amount of effort is creating Pluralsight trainings. And one of my trainings, where I put in a huge amount of work is “Getting Started with Enterprise Search Using Apache Solr”, which takes a dev with 0 experience in Solr and a bit of .Net and in 3.4 hours teaches him or her how to build a working POC style project with Solr and a .NET MVC UI.

You can watch the training here: pluralsight.com/training/courses/TableOfContents?courseName=enterprise-search-using-apache-solr

Getting to the point, Pluralsight recently acquired CodeSchool and to celebrate they opened their library for 72 hours for free. So I announced in a couple of Linkedin groups that the course on Solr will be free for this time in case they want to take advantage of the offer.

Huge surprise did I get when I see a newsletter from Solr-Start (www.solr-start.com) announcing this. It turns out that Alexandre Rafalovitch, a well known Solr popularizer and author saw my notice and blasted off an email to his crowd.

It feels great when a good author shares your news over a newsletter! I wouldn’t even asked him to do this but he did it on his own and for that I really have to thank him.

And by the way, if you are just getting started with Solr, his book Instant Apache Solr for Indexing Data How-to is an excellent resource that can help you understand how to index data. It has a lot of great tips and examples. I got it from amazon a while back and it has helped me greatly. 100% recommended!

You can get it here:

https://www.packtpub.com/big-data-and-business-intelligence/instant-apache-solr-indexing-data-how-instant

Or in Amazon.com, and as you can see I bought it 1 year ago.

Free Sample + Collection Code Files       Instant Apache Solr for Indexing Data How-to

 

 

 

 

Getting Started with Enterprise Search Using Apache Solr

by Xavier Comments: 0

Enterprise search used to be not for the faint of heart or with a thin wallet. However, since the introduction of Apache Solr the name of the game has changed. Solr brings high quality enterprise search to the masses. Don’t leave home without it!

And let me help you get started! My intention is to create a series of posts where I can help you get started with Solr. This process can be easy if tackled with the appropriate resources, but it can be daunting if you chose the wrong ones.

I will start by describing what each module of my training covers, click on the bullet to be taken directly to the post.

  • Why Solr & Enterprise Search?
  • Architecture of an Enterprise Search Application
  • Solr Configuration
  • Content: Schemas, Documents and Indexing
  • Searching & Relevance
  • Making it all Work: Put a UI on It!
  • Final Words

My course is available in Pluralsight: Getting Started with Enterprise Search using Apache Sol. You can watch it here:
http://pluralsight.com/training/courses/TableOfContents?courseName=enterprise-search-using-apache-solr

Solr training in Pluralsight