Whenever you want to start Solr or any other search or big data application, you need to have as prerequisite the Java Runtime Environment, known as JRE.
How do you find out if you have the JRE?
Open the command line and run
java -version
The other day I needed to finish a task I had in one of my servers and needed to remote into one of my Cloudera QuickStart VMs to run a test while on a trip. So I installed TeamViewer to access it. Steps are simple:
# Click on Download TeamViewer link for RedHat, CentOS, Fedora, SUSE to get the rpm package from the downloads page
https://www.teamviewer.com/en/download/linux/
Open terminal and go to downloads directory
sudo yum localinstall teamviewer_12.0.71510.i686.rpm
And then start with
teamviewer
Big Data
Big deal?
Big hype?
Or big change in our world?
I think that the answer can be all of the above. “Hype” you might be thinking? Well, here is the deal. Our world has changed in unimaginable ways. The amount of information created daily is reaching levels that just a few years ago would’ve been considered science fiction or even plain old crazy.
Lots and Lots of Data
To make it even more interesting, a lot of it is unstructured data. Which can be kind of a problem if we think about it, because the success of relational databases has taught a lot of us to think in a columnar and relational way.
And this is not bad… at all. It is nice to have all your data and metadata organized neatly. You can use select, join, where, group by and more to get what you need.
But the success of relational databases can also create a blind spot for many. Just a few days ago I was talking with the VP and cofounder of a company related to migrations and artificial intelligence software whose company has faced success (as well as a few failures or learning experiences) in several world class projects. They had lots of data that they obtain from their automated code conversion tools and what are they doing? They are normalizing it into a database.
I don’t think it is a bad approach, however it is not the one that I would take. Long story short, I would store the logs as is in their raw format and then use any of the available projects to analyse it in multiple ways, looking for key points, failures, trends and more. But what you do with the data is the topic of another post or a Pluralsight training. Let’s go back to our main point.
Mountains of data is being generated daily and the amount will just continue to -grow- explode.
Unstructured Data “Just Happens”
If you had to structure all your data, do you imagine what the cost would be? Just go ask your manager for an Oracle system and some servers to process all of your web server logs to put them in tables. The cost would be exorbitant.
And beside cost, sometimes you may not know the structure of your data. And that is one of the beautiful parts of Big Data. You can just store your logs in raw format and later come back and do your work, modelling your data in different ways. And what if you have too much data and the process is taking longer than expected?
Well, just add a few more servers and get the job done in parallel. Hadoop runs in commodity hardware, thus you can get many relatively inexpensive machines to work together and process your data according to your needs.
The Cloud and the Bar
And even better, remember “the cloud”! A few years ago if you were a startup and needed beefy power, you would need a lot of upfront cash to cover expenses. Now with AWS and Azure we have the possibility of turning a few virtual machines, get a cluster up and running, crunch the data, get the result, turn them off and only pay for the time you use.
And this change has lowered the entry bar for innovation. Now many brilliant ideas can be tested or theories can be analysed at a much lower cost, benefiting all man kind. For example, it is possible to run analysis on medical treatments to help cure cancer or many other diseases. Sometimes answers to hard questions lie right there in the data, they just need to be discovered.
Hype or Go Figure This Hadoop Thing Out
But what about hype? Let me make this clear, I don’t think Big Data is hype. I do think that there is a lot of hype around it and even though we are able to do great things with Big Data, the greater public does not yet fully understand what can be done and how so I have taken a personal mission to help developers and the public in general understand Big Data (and Search)
So then it is time to ask ourselves this question:
What Are My Choices for Getting Started with Big Data?
Here is a collection of some interesting or fun articles that I have found on Big Data
– 5 Reasons to Move to Big Data (and 1 Reason Why It Won’t Be Easy): gives an easy to understand set of selling points on why to adopt Big Data, but making it clear some of the issues you might face.
– The Most Practical Big Data Use Cases Of 2016: covers some interesting use cases of Big Data. Remember, Big Data is sexy!
– Why ‘Big Data’ Means Nothing Without ‘Little Data’: Little Data is regular performance metrics.
– Why Big Data is the new competitive advantage: provides good points on how Big Data can help give you an edge.
– Big Data: What is it and Why it Matters: goes straight to the point to explain the basics.
– Big Data Analytics: What is it and Why it Matters: explains what is Big Data Analytics.
I have been working with Solr for a while, mainly from the .NET world and I basically love it. I use SolrNet which I think it is a very mature and stable library. I was asked today if I have ever used SolrExpress and if I recommend it over SolrNet.
The short answer is no, I have not used it. Therefore I can’t give a facts based recommendation, but looking over the source code of both libraries it is my opinion that SolrNet is still more complete. So I still believe SolrNet to be a more sensible choice.
It is worth mentioning that is a biased point of view, as I have used SolrNet multiple times and it really has made my life a lot easier.
Having said that, besides using it several times, I have authored a few things around Solr and SolrNet and used it extensively. It works fine and I know it pretty well. It basically gets the job done, it is pretty mature and almost complete (pending SolrCloud and a few minor things like a breaking change on collation).
Some of the things I created
I created a Solr training for Pluralsight
https://www.pluralsight.com/courses/enterprise-search-using-apache-solr
Getting Started with Enterprise Search Using Apache Solr …
www.pluralsight.com
Search is one of the most misunderstood functionalities in the IT industry. Apache Solr brings high quality Enterprise Search to the masses.
And a SolrNet training for Pluralsight
https://www.pluralsight.com/courses/implementing-search-dotnet-applications
I wrote a book for a company called SyncFusion for their Succinctly Series for Solr and SolrNet
https://www.syncfusion.com/resources/techportal/ebooks/apachesolr
I’ve also done internal trainings, presentations and webcasts on Solr + SolrNet
http://www.meetup.com/Atlanta-Net-User-Group/events/222161640/
Learn How to Add Search to .NET with Solr & SolrNet …
www.meetup.com
Search is a functionality that most people take for granted while at the same time it is deeply misunderstood and usually poorly implemented. .NET
SolrNet does not have yet support for SolrCloud in the main repository, but there is one fork that already uses it but our current project does not use forks, only the main repository. If that is not a blocker for your customer, go ahead or like in our case, just use a load balancer for querying and a call to zookeeper api to get leader for indexing.
Hope this helps.
Yesterday I was coming back from the beautiful mountains of Monteverde in Costa Rica, feeling full of energy after a relaxing weekend. Monteverde is one of the most beautiful places I’ve been. Newsweek has declared Monteverde the world’s #14 Place to Remember Before it Disappears.”
Anyway, on the drive back I stop and decide to check my email, as usual, and I see a contact form from my blog so I decide to check it. This is what I found, a note from Robert Stevens: Read more!
Whenever you want to start Solr or any other search or big data application, you need to have as prerequisite the Java Runtime Environment, known as JRE.
How do you find out if you have the JRE?
Open the command line and run
java -version
I received a question today on stemming and multi language. Basically, “why do we need multiple fields in our Solr in different languages and how do I test multi language stemming?”.
First of all, let’s explain what stemming is. Stemming involves reducing words to their stem (or base or root) during indexing and querying in an effort to improve recall.
For example, if a document includes the following phrase “Xavier walked to work every morning from Westside Parkway” and a user searches for walk then the results will correctly include the document that has walk. Read more!
I was having a conversation today with a person that needed some help on teaching his PMs Agile. I had a very simple response, get them started by watching the excellent trainings available in Pluralsight.
So, the first time I told him was:
– Agile has proven a succesful methodology in software… when done right Read more!
There are multiple ways of creating cores in Solr. It is very straightforward and one of the ways is by calling Solr’s REST admin with action=create and also you can do it via bin\solr.cmd, however you could run into a small issue. Let me explain quickly this scenario that you might run into.
First of all, you can create using solr.cmd with the following command:
bin\solr.cmd create -c <nameofthecore>
And a fresh new core is created, which echos back the call made: http://localhost:8983/solr/admin/cores?action=CREATE&name=othercourses&instanceDir=othercourses
So then what if you are curious and decide to make the call directly yourself: (of course, changing core name)
http://localhost:8983/solr/admin/cores?action=CREATE&name=othercourses&instanceDir=othercourses
Well, it does not work!
The hint there is that it can’t find some resources, namely solrconfig.xml. To solve this issue, you only need to specify what are the base configurations that you want to use. So the call would be:
http://localhost:8983/solr/admin/cores?action=CREATE&name=othercourses&instanceDir=othercourses&configSet=basic_configs
And presto, you get your core! Little detail, but worth knowing what was missing
A couple of days ago I got asked, how do we monitor our cluster? Well, there are professional ways and other for the budget conscious deployment. Here are a few options that came to my mind:
Just a few thoughts I wanted to share.