The History of Everything Around Big Data

by Xavier Comments: 0

The History of Everything Around Big Data

The tech world changes fast… really fast.

It seems like every time you blink, there is a new framework that gets created or a new language comes along.

In some cases, you can just ignore all these new shiny things… but maybe, just maybe this new framework, language, or service can help make your life easier.

But how do you stay up to date?

That’s where I come in. I will be posting several articles where I go deeper into the world of tech, with a primary focus around everything Big Data.

Some fo the topics that I will cover include getting to know which are the leading Big Data products, their origins, how and when to use them and why do they matter?

And if you are tight on time, then I have other good news for new. Each one of these posts will come with a video so that you can hear about a particular topic while you are at the gym, commuting, or perhaps need something to put you to sleep.

Here’s the list of what we have published and what’s coming in the near future:

Welcome to Big Data TV – Or The One That Started It All 

This is just the intro post, which tells you a bit more of what I am going to be covering next.

Check out the post here or the video here

Here’s what’s coming next:

The Story of Hadoop and Why Should I Care?

by Xavier Comments: 0

You might have heard or seen the term Big Data. The term refers to data sets that are too large or complex to be dealt with through traditional processing applications.

In fact, the information within these data packets is so enormous it can’t be stored or processed on one server. Instead, it might take calls to several devices to retrieve the data. Even then, process time can still be incredibly slow.

Distributed Computing

This is where Hadoop comes in. Developed in 2005 by a pair of Apache software engineers, the platform creates a distributed model to store large data sets within computer clusters. In turn, these clusters work together to execute programs and handle potential issues.

So, how did we get to this point in the world of digital information? Did it appear without notice, or did the concept of large data sets gradually form?

Let’s get into some history on the creation of Big Data and its connections with Hadoop.

Beyond The Information Age

The concept of Big Data goes beyond the Information Age. Individuals and groups have dealt with large amounts of information for centuries.

For instance, John Graunt had to deal with volumes of information during the Bubonic Plague of the 17th century. When he compiled the data into logical groups, he created a set of statistics. Graunt eventually became known as the father of demographics.
Issues with large data occurred after this, as did the development of solutions. In 1881, Herman Hollerith created a tabulating machine that used punch cards to calculate the 1880 Census. In 1927, Fritz Pfleumer invented a procedure to store data on a strip of magnetic tape.
As more data was collected, the means to store and sort it changed. There wasn’t any choice as the information became increasingly complicated. For example, the amount of calculations required by NASA and other space agencies to launch successful programs.
Move Into Popular Culture

However, this didn’t match the accumulation of data collected once computers were made available to the public. It reached enormous sizes when those users learned about the internet. Add smart devices, artificial intelligence, and the Internet of Things (IoT), and “Big” has become exponentially huge.

Consider what is part of this label. Social media is a large piece of it. Credit card companies and other groups that handle Personally Identifiable Information (PII) also produce large amounts of information. Banks and other financial firms create well beyond trillions of data bytes in a single hour.

The Official Term

It wasn’t until 2005 that this process was given the name we know today. It was coined in 2005 by Roger Mougalas, a director of market research at O’Reilly Media. At that time, he referred to it as a set information that was nearly impossible to process with traditional business tools. That includes Relational Database Management Systems (RDBMS) like Oracle.

What could a business or government entity do at that point? Even without excessive information from mobile devices, there was still a large volume of data to compile and analyze. This is where two Apache designers — Doug Cutting and Mike Cafarella — came into play.

Computer Clusters And Large Data

In 2002, these engineers started work on the Apache Nutch product. Their goal was to build a new search engine that could quickly index one billion pages of information. After extensive research, it was determined the creation of Nutch would be too expensive. So, the developers went back to the drawing board.

Over the next two years, the team studied potential resolutions. They discovered two technological white papers that helped. One was on the Google File System (GFS) and the other was on MapReduce. Both discussed ways to handle large data sets as well as index them to avoid slowdowns.

This is when Cutting and Cafarella decided to utilize these two principles and create an open source product that would help everyone index these large data amounts. In 2005, they created the first edition of the product, then realized it needed to be established on computer clusters to properly work. A year later, Cutting moved the Nutch product to Yahoo.

It’s here he got to work. Cutting removed the distributed computing parts of Nutch to create the framework for Hadoop. He got the name from a toy elephant his son owned.

With GFS and MapReduce, cutting created the open source platform to operate on thousands of computer nodes. In 2007, it was successfully tested on 1000 nodes. In 2011, the software was able to sort a Petabyte of data in 17 hours. This is equal to 1000 Terabytes of material. The product became available to everyone that same year.

Of course, this is not the end to the story of solutions needed for the index of large data. Technology continues to change, especially if outside influences make more of us head to our computers. There will come a time when something more powerful will be required than multiple storage nodes.

Until then, we thank those who have already gone through the steps to help all of us retrieve large amounts of data in the quickest and most efficient way possible.

Easy VOIP Calling for a Small Business

by Xavier Comments: 0

Did your company had to transition immediately to WFH?

Are you now in a disadvantage because you had your small business phone system all set up but now it does not work when everyone is at home or it gets quite expensive?

Here is the solution that has worked the best for me over the years, using Skype Manager—which does not involve any setup and has a reduced cost. You just need to install Skype, which runs in a computer in pretty much any OS, tablet, iOS, Android… you name it.

Here’s my business case. I own a support center that provides support to a tech company.

We are in Costa Rica, their customers are in the US and Canada.

When I was asked to implement the phone system I had several options which included setting up an Asterisk central or looking for other solutions.

I had tried Asterisk before but it had several drawbacks.

So what I did is use Skype Manager to invite the collaborators, assigned them a subscription so that they can make calls and, assigned them a land line to receive calls.

You can allocate credit in case they are calling an area not covered by their subscription, you can allocate a Live Chat button if you want to, or Skype Connect in case you need to integrate with an existing SIP-enabled PBX.

You have very good control of how you spend your money.

And there are plans for everywhere.

Each plan even has multiple options tailed to your needs.

The cost savings, easy setup, and control are amazing.

I know, there are newer options like RingCentral or others that provide good functionality.

But this one worked for me, and it gets the job done.

Hope it helps.

Tip of the Day: The Best Screen Capture Tool

by Xavier Comments: 1

One of the things about working from home is that you can’t just pick up your laptop, turn it around, and tell your coworker: “look here, this is what I need”.

If you are remote or distributed, the story is different. You have to share in a particular way. You could start a screen sharing session, but that may be overkill.

Here is where screen capture comes to the rescue.

The “standard” way is to press the print screen key (PrtScr), open mspaint, paste, save, and then send via email or chat.

Well, that did not sound that convenient.

Let me tell you about a lovely tool called Jing that I have been using for many years—although it is now known as Techsmith Capture:

The lovely thing about this tool is that it is pretty easy to use. One nice feature of Jing is that it puts a small sun in the corner of your screen, so you can hover over it and it expands showing the available options.

You then select which part of the screen you want to capture, and then it gives you the option to add text, arrows, and more.

Then you can save locally, to your clipboard, or upload to TechSmith servers and it gives you a URL.

This last one is quite nice as you can share immediately.

By far Jing is the best tool that I’ve used for the last 10 years. Hopefully the transition to Techsmith Capture won’t let me down.

Oh and it works with images as well as short videos too!

What are you waiting for, download the tool and share away.

Working from Home 101

by Xavier Comments: 0

Work from Home 101 – Costa Rica Edition
I live in Costa Rica, which may come in handy if you need any travel tips.

However, because of “life”, I’ve spent my entire career working for project in either USA or in the UK, hence I’ve done this remote-thing (or distributed, use your choice of words) for a while.

I began as an employee, but I eventually transitioned into entrepreneurship where I have a few things going on, including my Pluralsight courses, a tech support center, and supporting from time to time Cloudera.

But let’s cut to the chase. I am not going to bother you with details or anecdotes—I’ll do a few posts some other day on those.

Today, I am going to tell you what has worked for me over the years.

Hopefully, this will help you too.

First of all, work from home (WFH) is still work. The fact that you are at home does not mean that it is not serious business.

It is.

How others perceive you is going to be a reflection of your actions, including the quality of your work and how responsive you are.

So, one of the things that I suggest is to set a schedule. In my case, for many years, despite the fact that in many of my projects I did not have to “clock in”, I still had a predictable work schedule.

I also took advantage of starting early, as that allowed me to work for a few hours undisturbed, focused, before you start to get asked to attend meetings or “can you help me for just a sec on this?”.

Something important here is that you need to be flexible too. In this pandemic, with work from home, home schooling, and quarantine in general, I am modifying schedules a bit.

I am still working on a predictable schedule, where people can reach me, but I am also using the late night hours to work as they provide quieter times—invaluable when you are focusing on something hard.

Next tip, create a dedicated work area. I have to admit that even though I work remote/distributed, I’ve done a lot of my work in a really small room where I have all my recording equipment. This includes the dedicated recording machine, Whisperroom, and all kinds of equipment that I cannot fit at home.

Although, at the moment I haven’t been there in more than a month as I am trully-fully working from home now.

This means that I have a very small desk and I am using an iPad as second monitor.

This works quite well. Although if you think about it, working at an office means you are moving all the time too. How many times have you moved to a conference room for a meeting or some focused time?

Same here. The kitchen counter-top is pretty high, so I am using it as a standing desk. It is far from the cooking utensils, so it works quite well.

Talking about moving, here is another tip.

Don’t sit down for hours straight. Try to get up every now and then, move a bit around. Your health comes first, so if possible also throw in some exercise every now and then.

Yes, exercise is important. It helps you think better and I don’t think I need to convince you of this.

Here’s more. Stick to a ritual. Just as I mentioned that having a schedule works well, try to also have a ritual of how you approach your work.

The more that you make your daily work a habit, the more the subconscious will take over and you will move forward faster.

Here, it is important that you block all distractions. It is said that whenever you are working on something and you are deeply focused, if someone interrupts you, then it will take 15 minutes to get back to what you were doing.

What if you get interrupted every 10 minutes? Well, you get my point…

But on the other hand, you still have to be responsive.

If you are part of a team, let them know that you will be focused on your work, and that you will check for messages at a certain interval.

But don’t forget to be responsive and prioritize your work. I had a remote worker once not respond for like half an hour. When she finally showed up, she said “well I was reading a novel and the chapter was pretty interesting”.

For her, being responsive was her top priority as she was in charge of distributing work.

I turn off all Whatsapp, email, Slack, and other notifications. But I check them after I finish a certain amount of work.

Better yet, as we all know, getting those messages releases some endorphines so it makes you addicted to checking “what’s new”.

If you make a habit of completing something and then checking, then this will help you be more productive.

Additionally, on checking what’s new… try to reduce your consumption of those things that do not add value to your life.

Instead of binging on Netflix, get a new skill by binging on Pluralsight.

Did I mentioned they are free during April?

And remember that what you learn is yours for life!

I’ll leave it here today, but tomorrow I will come back with more on what can help you with this new WFH (for many) situation.

Sharing Screen for Remote Work 101 (Mother’s Edition)

by Xavier Comments: 0

Coronavirus explosion is in full force right now.

We are currently going through very tough times, unprecedented times for most of us—perhaps not comparable to a war, however still challenging and full of fear because of economic repercusions.

Quarantine and remote work is new for many. Today, my mother asked me a simple question—for me—yet a hard one for her.

How do I share screen with my workers?

Here is a really simple guide for her (and you) to follow, using Google Meet.


You need to be logged in with an account that can start meetings. She sent me this. It means that she needs to log in.

Step #1 Open Google Meet

Navigate to and you will be greeted by a screen like this one:

Step #2 Start the Meeting

Click on Join or start a meeting

Step #3 Name the Meeting

Give the meeting a representative name. If it is an impromptu meeting, then the name is not that important. When you start using Meet often, use representative names so that you and your attendees remember what each meeting is for.
Click on Continue

Step #4 Start the meeting

Congratulations! You have a meeting now. But you need to join your meeting. Click on Join now. to get started.

By default you will join using computer audio. Your camera will most likely be on. Turn it off if you are in “quarantine-not-presentable-mode”.

The meeting information is displayed. You can share it with the other participants, or click on Add people.

You can also copy the link in the address bar and share it

Step #5 Sharing the Screen

In the bottom right corner, you can click on Present now to share your screen. Everyone in the meeting can share the screen, not only you.

You need to select which monitor to share. Select it and click Share.


  • You can mute, leave call (don’t touch the red one, you will leave the call), or turn off the camera.
  • You can use the Chat window, in the top right, to pass URLs back and forth.
  • You can automagically add Meet (Hangouts) when you are creating a meeting invite. Click on the Add conferencing below the Add location.

Step #6 Stop Presenting your Screen

When you are done presenting, click on Stop presenting. This is important as if you leave it open, the other person will keep looking at what you are doing.

Step #7 Leaving the Meeting

Click on the red telephone to leave the meeting


Autographs – Who to Ask For One?

by Xavier Comments: 0

Many years ago, Pele (Edson Arantes do Nascimento) visited Costa Rica. I was a kid and someone recommended I should get an autograph. And so I did.

To be honest, it is the only autograph I’ve ever asked for. Here is the reason.

He may have been the greatest soccer (or futbol) player on Earth, but hearing about how K. Scott Allen (Ode To Code) passed away today got me thinking that people like Scott are the ones who people should ask for autographs.

Why? Because people like him are the ones uplifting others, helping them, teaching – great deal of it with Pluralsight. Nothing against Pele, he was great but the impact in the world that a teacher like Scott had can change the lives of those who he helped.

Anyway, this may be a not too popular opinion, especially for soccer fans, but I do believe in the power of teaching.

Upgrading a .NET Application with Solr and SolrNet

by Xavier Comments: 2

Today I took the task of updating a .NET application that I’ve had since sometime around 2013.

I created this application with a few people at the time of .NET 4.0 and Solr 4.10, with its corresponding SolrNet.

Today I moved it to Solr 8.4 and with .NET 4.7.

There are a few interesting changes that you need to take into account, which I may expand at some point. But just in case, if you are in the same scenario, here are some upgrade tips for you to consider:

#1 When you move from Solr 4.10 (and older versions) to 8.4, there are some changes to take into account

  • The default field is now _text_ and not text
  • Some types may have changed
  • Default now is managed-schema, not schema.xml
  • Some changes are required in solrconfig.xml

So, what I did is downloaded, installed and started a Solr 8.4.

Then I created a core

bin\solr.cmd create -c <name>

Next, I configured Solr so that I do not use schemaless mode. Use this link for more info: Switching from Managed Schema to Manually Edited schema.xml

Don’t forget that besides changing the ClassicIndexSchemaFactory, you also need to disable schema guessing, which allows unknown fields to be added to the schema during indexing.

Then I indexed some data. This didn’t change much.

Now, a fun one. I have a custom request handler, which wasn’t working. Oh dear, I forgot for a second that the qt parameter no longer works unless you explicitly configure Solr to work.

This is straightforward. Comment out your select request handler, and add the handleSelect attribute to true in the RequestDispatcher node. Like this:

<requestDispatcher handleSelect="false" >

Also, comment out the select requestHandler.

Restart Solr and happy searching!

Learning Apache Solr – Online Training – Instructor Led Training – Book

by Xavier Comments: 0

Search is one of the most misunderstood functionalities in IT. Everyone takes it for granted unless it is missing or badly implemented.

The other day I was asked how can I learn search, with Solr?

There are manyways, although I’ve done what I can to help others learn enterprise search. Here are three resources:

Pluralsight Online Solr Training

I created two trainings that teach you what you need to know to get started with Solr and create a search API with Solr and SolrNet (oriented towards Microsoft-centric technologies, i.e. C#).

Best part is that it is only $29 a month to get a subscription to Pluralsight and you can learn about many other topics that are relevant for your career.

Getting Started with Enterprise Search Using Apache Solr

Implementing Search in .NET Applications

Cloudera Search Instructor-led Training

If you prefer to take an instructor-led training, Cloudera has a great training, with amazing instructors to teach you Solr. If you were not aware, Cloudera Search is actually Solr but running on top of a Hadoop cluster. So hello Big Data!

Cloudera Search Training

SyncFusion Apache Solr and SolrNet Book

I published a book on Solr for SyncFusion. It is part of the Succinctly Series, so it is a condensed resource that helps you get started. And it is free.

Apache Solr Succinctly


Hope they help. Ping me on twitter @xmorera if you have any questions!

Deploying Cloudera on Microsoft Azure

by Xavier Comments: 0

Are you in interested in Deploying Cloudera on Azure? If so, I invite you to watch this course that I created at Cloudera for Microsoft that teaches you how to install and deploy Cloudera on Azure in multiple different ways. Best of all, it is a free course! Please follow this link to watch Deploying and Scaling Cloudera on Microsoft Azure

The modules covered are:

  • The Building Blocks of Microsoft Azure for Deploying Cloudera
  • Cloudera on Azure – Cloud Deployment Best Practices & Patterns
  • Deploying CDH on Microsoft Azure Using Cloudera Manager & Azure Marketplace
  • Automating Deployments in Microsoft Azure Using Cloudera Director
  • Cloudera Altus in Azure Cloud – Machine Learning and Analytics as Platform-as-a-Service
  • Final Words