Whenever you want to start Solr or any other search or big data application, you need to have as prerequisite the Java Runtime Environment, known as JRE.
How do you find out if you have the JRE?
Open the command line and run
java -version
In my Pluralsight trainings, Getting Started with Enterprise Search using Apache Solr and Implementing Search in .NET Applications, one of the things that I make quite a bit of emphasis is on how important search is, yet it is one of the most misunderstood functions of IT and development in general. In this post I will show you an example of how a potentially good app is a pretty bad app mainly because of its search capabilities.
It is so much the case that in Twitter Pluralsight selected this phrase to tweet about the release of my course as you can see here:
But now let’s get to the sample. Here’s the scenario:
Problem: Life is busy. No time to go to the supermarket
Solution: use your grocery store’s web site to purchase your food and it gets delivered home the next day. Charming idea, did not work with Webvan, but it seems to be doing quite well for Amazon and in my home town one of the major supermarkets is doing it in a more controlled way with a good delivery service, all for $10. Not too scalable, but for a MVP it is ok. (Read Lean Startup if you don’t know what MVP is)
It may work or maybe not mainly because of a really bad user experience, but let me get to the point. UX is important! Never forget it!
You get to the app in https://www.automercado.co.cr/aam/showMain.do and they have mainly 4 sections as you can see here
And here is what they are for:
– On the left they have a directory style organized by aisle. Grouping kind of works in my opinion if you are not too sure of what you want, but it is terribly slow and inefficient. They lose cookie points for this.
– Then in the middle they have a section where they display the products. This is very standard so it kind of goes through, however they lose cookie points again for having products without pictures or with very weird stretching. They are a supermarket, and a big one, so I am sure they can send a guy with an iPhone to take a quick picture.
– The cart has a problem which is that they do not actually display the product name, only the description. Who thought of this? Not even something as simple as a tooltip!
And then here is the deal breaker for me: BAD SEARCH! As mentioned in the post, search is one of the most misunderstood functionalities in IT. A lot of people make huge mistakes because search can be done with a database, which it can, but the end results sucks! And it did suck here.
Let me show you this. I want to look for “jabon dial” which means “Dial Soap”. So I just type “Jabon Dial”. Should work, right? It doesn’t! Look at the message: “No results found…”. Also I hate the CAPS. There may be 1 technical reason I can think of but it is pretty dumb.
But why? If you look closely there are 27 types of “Jabon Dial”, type only Dial
The problem lies here:
– The person that implemented this application had no knowledge of how search works, which is normal as search is pretty misunderstood.
– But humans don’t do search like engineers want. Having the user do a search exactly like the engineer wants is just lazy and ineffective.
– So engineers who created this probably went for a simple exact match in a database search
– This is a terrible user experience. I can bet the farm that Amazon would have closed its doors in the 1990s if they had such a bad search
How to fix it? Well, go learn how to use a search engine. And that’s why I created my course, Getting Started With Enterprise Search Using Apache Solr: http://www.pluralsight.com/training/Courses/TableOfContents/enterprise-search-using-apache-solr
Something that really annoys me, especially when connecting remotely is how the terminal blanks when installing Cloudera Manager in Linx CentOS
Well, there is a very simple fix, simply run the following command and the terminal will not go black
sudo setterm -blank 0
Big Data
Big deal?
Big hype?
Or big change in our world?
I think that the answer can be all of the above. “Hype” you might be thinking? Well, here is the deal. Our world has changed in unimaginable ways. The amount of information created daily is reaching levels that just a few years ago would’ve been considered science fiction or even plain old crazy.
Lots and Lots of Data
To make it even more interesting, a lot of it is unstructured data. Which can be kind of a problem if we think about it, because the success of relational databases has taught a lot of us to think in a columnar and relational way.
And this is not bad… at all. It is nice to have all your data and metadata organized neatly. You can use select, join, where, group by and more to get what you need.
But the success of relational databases can also create a blind spot for many. Just a few days ago I was talking with the VP and cofounder of a company related to migrations and artificial intelligence software whose company has faced success (as well as a few failures or learning experiences) in several world class projects. They had lots of data that they obtain from their automated code conversion tools and what are they doing? They are normalizing it into a database.
I don’t think it is a bad approach, however it is not the one that I would take. Long story short, I would store the logs as is in their raw format and then use any of the available projects to analyse it in multiple ways, looking for key points, failures, trends and more. But what you do with the data is the topic of another post or a Pluralsight training. Let’s go back to our main point.
Mountains of data is being generated daily and the amount will just continue to -grow- explode.
Unstructured Data “Just Happens”
If you had to structure all your data, do you imagine what the cost would be? Just go ask your manager for an Oracle system and some servers to process all of your web server logs to put them in tables. The cost would be exorbitant.
And beside cost, sometimes you may not know the structure of your data. And that is one of the beautiful parts of Big Data. You can just store your logs in raw format and later come back and do your work, modelling your data in different ways. And what if you have too much data and the process is taking longer than expected?
Well, just add a few more servers and get the job done in parallel. Hadoop runs in commodity hardware, thus you can get many relatively inexpensive machines to work together and process your data according to your needs.
The Cloud and the Bar
And even better, remember “the cloud”! A few years ago if you were a startup and needed beefy power, you would need a lot of upfront cash to cover expenses. Now with AWS and Azure we have the possibility of turning a few virtual machines, get a cluster up and running, crunch the data, get the result, turn them off and only pay for the time you use.
And this change has lowered the entry bar for innovation. Now many brilliant ideas can be tested or theories can be analysed at a much lower cost, benefiting all man kind. For example, it is possible to run analysis on medical treatments to help cure cancer or many other diseases. Sometimes answers to hard questions lie right there in the data, they just need to be discovered.
Hype or Go Figure This Hadoop Thing Out
But what about hype? Let me make this clear, I don’t think Big Data is hype. I do think that there is a lot of hype around it and even though we are able to do great things with Big Data, the greater public does not yet fully understand what can be done and how so I have taken a personal mission to help developers and the public in general understand Big Data (and Search)
So then it is time to ask ourselves this question:
What Are My Choices for Getting Started with Big Data?
Here is a collection of some interesting or fun articles that I have found on Big Data
– 5 Reasons to Move to Big Data (and 1 Reason Why It Won’t Be Easy): gives an easy to understand set of selling points on why to adopt Big Data, but making it clear some of the issues you might face.
– The Most Practical Big Data Use Cases Of 2016: covers some interesting use cases of Big Data. Remember, Big Data is sexy!
– Why ‘Big Data’ Means Nothing Without ‘Little Data’: Little Data is regular performance metrics.
– Why Big Data is the new competitive advantage: provides good points on how Big Data can help give you an edge.
– Big Data: What is it and Why it Matters: goes straight to the point to explain the basics.
– Big Data Analytics: What is it and Why it Matters: explains what is Big Data Analytics.
I have been working with Solr for a while, mainly from the .NET world and I basically love it. I use SolrNet which I think it is a very mature and stable library. I was asked today if I have ever used SolrExpress and if I recommend it over SolrNet.
The short answer is no, I have not used it. Therefore I can’t give a facts based recommendation, but looking over the source code of both libraries it is my opinion that SolrNet is still more complete. So I still believe SolrNet to be a more sensible choice.
It is worth mentioning that is a biased point of view, as I have used SolrNet multiple times and it really has made my life a lot easier.
Having said that, besides using it several times, I have authored a few things around Solr and SolrNet and used it extensively. It works fine and I know it pretty well. It basically gets the job done, it is pretty mature and almost complete (pending SolrCloud and a few minor things like a breaking change on collation).
Some of the things I created
I created a Solr training for Pluralsight
https://www.pluralsight.com/courses/enterprise-search-using-apache-solr
Getting Started with Enterprise Search Using Apache Solr …
www.pluralsight.com
Search is one of the most misunderstood functionalities in the IT industry. Apache Solr brings high quality Enterprise Search to the masses.
And a SolrNet training for Pluralsight
https://www.pluralsight.com/courses/implementing-search-dotnet-applications
I wrote a book for a company called SyncFusion for their Succinctly Series for Solr and SolrNet
https://www.syncfusion.com/resources/techportal/ebooks/apachesolr
I’ve also done internal trainings, presentations and webcasts on Solr + SolrNet
http://www.meetup.com/Atlanta-Net-User-Group/events/222161640/
Learn How to Add Search to .NET with Solr & SolrNet …
www.meetup.com
Search is a functionality that most people take for granted while at the same time it is deeply misunderstood and usually poorly implemented. .NET
SolrNet does not have yet support for SolrCloud in the main repository, but there is one fork that already uses it but our current project does not use forks, only the main repository. If that is not a blocker for your customer, go ahead or like in our case, just use a load balancer for querying and a call to zookeeper api to get leader for indexing.
Hope this helps.
Yesterday I was coming back from the beautiful mountains of Monteverde in Costa Rica, feeling full of energy after a relaxing weekend. Monteverde is one of the most beautiful places I’ve been. Newsweek has declared Monteverde the world’s #14 Place to Remember Before it Disappears.”
Anyway, on the drive back I stop and decide to check my email, as usual, and I see a contact form from my blog so I decide to check it. This is what I found, a note from Robert Stevens: Read more!
Whenever you want to start Solr or any other search or big data application, you need to have as prerequisite the Java Runtime Environment, known as JRE.
How do you find out if you have the JRE?
Open the command line and run
java -version
I received a question today on stemming and multi language. Basically, “why do we need multiple fields in our Solr in different languages and how do I test multi language stemming?”.
First of all, let’s explain what stemming is. Stemming involves reducing words to their stem (or base or root) during indexing and querying in an effort to improve recall.
For example, if a document includes the following phrase “Xavier walked to work every morning from Westside Parkway” and a user searches for walk then the results will correctly include the document that has walk. Read more!
I was having a conversation today with a person that needed some help on teaching his PMs Agile. I had a very simple response, get them started by watching the excellent trainings available in Pluralsight.
So, the first time I told him was:
– Agile has proven a succesful methodology in software… when done right Read more!