It seems like every time you blink, there is a new framework that gets created or a new language comes along.
In some cases, you can just ignore all these new shiny things… but maybe, just maybe this new framework, language, or service can help make your life easier.
But how do you stay up to date?
That’s where I come in. I will be posting several articles where I go deeper into the world of tech, with a primary focus around everything Big Data.
Some fo the topics that I will cover include getting to know which are the leading Big Data products, their origins, how and when to use them and why do they matter?
And if you are tight on time, then I have other good news for new. Each one of these posts will come with a video so that you can hear about a particular topic while you are at the gym, commuting, or perhaps need something to put you to sleep.
Here’s the list of what we have published and what’s coming in the near future:
Welcome to Big Data TV – Or The One That Started It All
This is just the intro post, which tells you a bit more of what I am going to be covering next.
You might have heard or seen the term Big Data. The term refers to data sets that are too large or complex to be dealt with through traditional processing applications.
In fact, the information within these data packets is so enormous it can’t be stored or processed on one server. Instead, it might take calls to several devices to retrieve the data. Even then, process time can still be incredibly slow.
This is where Hadoop comes in. Developed in 2005 by a pair of Apache software engineers, the platform creates a distributed model to store large data sets within computer clusters. In turn, these clusters work together to execute programs and handle potential issues.
So, how did we get to this point in the world of digital information? Did it appear without notice, or did the concept of large data sets gradually form?
Let’s get into some history on the creation of Big Data and its connections with Hadoop.
Beyond The Information Age
The concept of Big Data goes beyond the Information Age. Individuals and groups have dealt with large amounts of information for centuries.
For instance, John Graunt had to deal with volumes of information during the Bubonic Plague of the 17th century. When he compiled the data into logical groups, he created a set of statistics. Graunt eventually became known as the father of demographics. Issues with large data occurred after this, as did the development of solutions. In 1881, Herman Hollerith created a tabulating machine that used punch cards to calculate the 1880 Census. In 1927, Fritz Pfleumer invented a procedure to store data on a strip of magnetic tape. As more data was collected, the means to store and sort it changed. There wasn’t any choice as the information became increasingly complicated. For example, the amount of calculations required by NASA and other space agencies to launch successful programs. Move Into Popular Culture
However, this didn’t match the accumulation of data collected once computers were made available to the public. It reached enormous sizes when those users learned about the internet. Add smart devices, artificial intelligence, and the Internet of Things (IoT), and “Big” has become exponentially huge.
Consider what is part of this label. Social media is a large piece of it. Credit card companies and other groups that handle Personally Identifiable Information (PII) also produce large amounts of information. Banks and other financial firms create well beyond trillions of data bytes in a single hour.
The Official Term
It wasn’t until 2005 that this process was given the name we know today. It was coined in 2005 by Roger Mougalas, a director of market research at O’Reilly Media. At that time, he referred to it as a set information that was nearly impossible to process with traditional business tools. That includes Relational Database Management Systems (RDBMS) like Oracle.
What could a business or government entity do at that point? Even without excessive information from mobile devices, there was still a large volume of data to compile and analyze. This is where two Apache designers — Doug Cutting and Mike Cafarella — came into play.
Computer Clusters And Large Data
In 2002, these engineers started work on the Apache Nutch product. Their goal was to build a new search engine that could quickly index one billion pages of information. After extensive research, it was determined the creation of Nutch would be too expensive. So, the developers went back to the drawing board.
Over the next two years, the team studied potential resolutions. They discovered two technological white papers that helped. One was on the Google File System (GFS) and the other was on MapReduce. Both discussed ways to handle large data sets as well as index them to avoid slowdowns.
This is when Cutting and Cafarella decided to utilize these two principles and create an open source product that would help everyone index these large data amounts. In 2005, they created the first edition of the product, then realized it needed to be established on computer clusters to properly work. A year later, Cutting moved the Nutch product to Yahoo.
It’s here he got to work. Cutting removed the distributed computing parts of Nutch to create the framework for Hadoop. He got the name from a toy elephant his son owned.
With GFS and MapReduce, cutting created the open source platform to operate on thousands of computer nodes. In 2007, it was successfully tested on 1000 nodes. In 2011, the software was able to sort a Petabyte of data in 17 hours. This is equal to 1000 Terabytes of material. The product became available to everyone that same year.
Of course, this is not the end to the story of solutions needed for the index of large data. Technology continues to change, especially if outside influences make more of us head to our computers. There will come a time when something more powerful will be required than multiple storage nodes.
Until then, we thank those who have already gone through the steps to help all of us retrieve large amounts of data in the quickest and most efficient way possible.
Did your company had to transition immediately to WFH?
Are you now in a disadvantage because you had your small business phone system all set up but now it does not work when everyone is at home or it gets quite expensive?
Here is the solution that has worked the best for me over the years, using Skype Manager—which does not involve any setup and has a reduced cost. You just need to install Skype, which runs in a computer in pretty much any OS, tablet, iOS, Android… you name it.
Here’s my business case. I own a support center that provides support to a tech company.
We are in Costa Rica, their customers are in the US and Canada.
When I was asked to implement the phone system I had several options which included setting up an Asterisk central or looking for other solutions.
I had tried Asterisk before but it had several drawbacks.
So what I did is use Skype Manager to invite the collaborators, assigned them a subscription so that they can make calls and, assigned them a land line to receive calls.
You can allocate credit in case they are calling an area not covered by their subscription, you can allocate a Live Chat button if you want to, or Skype Connect in case you need to integrate with an existing SIP-enabled PBX.
You have very good control of how you spend your money.
And there are plans for everywhere.
Each plan even has multiple options tailed to your needs.
The cost savings, easy setup, and control are amazing.
I know, there are newer options like RingCentral or others that provide good functionality.
But this one worked for me, and it gets the job done.
The lovely thing about this tool is that it is pretty easy to use. One nice feature of Jing is that it puts a small sun in the corner of your screen, so you can hover over it and it expands showing the available options.
You then select which part of the screen you want to capture, and then it gives you the option to add text, arrows, and more.
Then you can save locally, to your clipboard, or upload to TechSmith servers and it gives you a URL.
This last one is quite nice as you can share immediately.
By far Jing is the best tool that I’ve used for the last 10 years. Hopefully the transition to Techsmith Capture won’t let me down.
Oh and it works with images as well as short videos too!
What are you waiting for, download the tool and share away.
Work from Home 101 – Costa Rica Edition I live in Costa Rica, which may come in handy if you need any travel tips.
However, because of “life”, I’ve spent my entire career working for project in either USA or in the UK, hence I’ve done this remote-thing (or distributed, use your choice of words) for a while.
I began as an employee, but I eventually transitioned into entrepreneurship where I have a few things going on, including my Pluralsight courses, a tech support center, and supporting from time to time Cloudera.
But let’s cut to the chase. I am not going to bother you with details or anecdotes—I’ll do a few posts some other day on those.
Today, I am going to tell you what has worked for me over the years.
Hopefully, this will help you too.
First of all, work from home (WFH) is still work. The fact that you are at home does not mean that it is not serious business.
How others perceive you is going to be a reflection of your actions, including the quality of your work and how responsive you are.
So, one of the things that I suggest is to set a schedule. In my case, for many years, despite the fact that in many of my projects I did not have to “clock in”, I still had a predictable work schedule.
I also took advantage of starting early, as that allowed me to work for a few hours undisturbed, focused, before you start to get asked to attend meetings or “can you help me for just a sec on this?”.
Something important here is that you need to be flexible too. In this pandemic, with work from home, home schooling, and quarantine in general, I am modifying schedules a bit.
I am still working on a predictable schedule, where people can reach me, but I am also using the late night hours to work as they provide quieter times—invaluable when you are focusing on something hard.
Next tip, create a dedicated work area. I have to admit that even though I work remote/distributed, I’ve done a lot of my work in a really small room where I have all my recording equipment. This includes the dedicated recording machine, Whisperroom, and all kinds of equipment that I cannot fit at home.
Although, at the moment I haven’t been there in more than a month as I am trully-fully working from home now.
This means that I have a very small desk and I am using an iPad as second monitor.
This works quite well. Although if you think about it, working at an office means you are moving all the time too. How many times have you moved to a conference room for a meeting or some focused time?
Same here. The kitchen counter-top is pretty high, so I am using it as a standing desk. It is far from the cooking utensils, so it works quite well.
Talking about moving, here is another tip.
Don’t sit down for hours straight. Try to get up every now and then, move a bit around. Your health comes first, so if possible also throw in some exercise every now and then.
Yes, exercise is important. It helps you think better and I don’t think I need to convince you of this.
Here’s more. Stick to a ritual. Just as I mentioned that having a schedule works well, try to also have a ritual of how you approach your work.
The more that you make your daily work a habit, the more the subconscious will take over and you will move forward faster.
Here, it is important that you block all distractions. It is said that whenever you are working on something and you are deeply focused, if someone interrupts you, then it will take 15 minutes to get back to what you were doing.
What if you get interrupted every 10 minutes? Well, you get my point…
But on the other hand, you still have to be responsive.
If you are part of a team, let them know that you will be focused on your work, and that you will check for messages at a certain interval.
But don’t forget to be responsive and prioritize your work. I had a remote worker once not respond for like half an hour. When she finally showed up, she said “well I was reading a novel and the chapter was pretty interesting”.
For her, being responsive was her top priority as she was in charge of distributing work.
I turn off all Whatsapp, email, Slack, and other notifications. But I check them after I finish a certain amount of work.
Better yet, as we all know, getting those messages releases some endorphines so it makes you addicted to checking “what’s new”.
If you make a habit of completing something and then checking, then this will help you be more productive.
Additionally, on checking what’s new… try to reduce your consumption of those things that do not add value to your life.
Instead of binging on Netflix, get a new skill by binging on Pluralsight.
Did I mentioned they are free during April? https://www.pluralsight.com/offer/2020/free-april-month
And remember that what you learn is yours for life!
I’ll leave it here today, but tomorrow I will come back with more on what can help you with this new WFH (for many) situation.
Give the meeting a representative name. If it is an impromptu meeting, then the name is not that important. When you start using Meet often, use representative names so that you and your attendees remember what each meeting is for. Click on Continue
Step #4 Start the meeting
Congratulations! You have a meeting now. But you need to join your meeting. Click on Join now. to get started.
By default you will join using computer audio. Your camera will most likely be on. Turn it off if you are in “quarantine-not-presentable-mode”.
The meeting information is displayed. You can share it with the other participants, or click on Add people.
You can also copy the link in the address bar and share it
Many years ago, Pele (Edson Arantes do Nascimento) visited Costa Rica. I was a kid and someone recommended I should get an autograph. And so I did.
To be honest, it is the only autograph I’ve ever asked for. Here is the reason.
He may have been the greatest soccer (or futbol) player on Earth, but hearing about how K. Scott Allen (Ode To Code) passed away today got me thinking that people like Scott are the ones who people should ask for autographs.
Why? Because people like him are the ones uplifting others, helping them, teaching – great deal of it with Pluralsight. Nothing against Pele, he was great but the impact in the world that a teacher like Scott had can change the lives of those who he helped.
Anyway, this may be a not too popular opinion, especially for soccer fans, but I do believe in the power of teaching.
Today I took the task of updating a .NET application that I’ve had since sometime around 2013.
I created this application with a few people at the time of .NET 4.0 and Solr 4.10, with its corresponding SolrNet.
Today I moved it to Solr 8.4 and with .NET 4.7.
There are a few interesting changes that you need to take into account, which I may expand at some point. But just in case, if you are in the same scenario, here are some upgrade tips for you to consider:
#1 When you move from Solr 4.10 (and older versions) to 8.4, there are some changes to take into account
The default field is now _text_ and not text
Some types may have changed
Default now is managed-schema, not schema.xml
Some changes are required in solrconfig.xml
So, what I did is downloaded, installed and started a Solr 8.4.
Hello and welcome, I am Xavier Morera and I am very passionate about helping developers understand enterprise search and Big Data.
And today, I welcome you to the first post of the Big Data Inc Series (which will soon be joined with Big Data TV).
So, you might be wondering… what is the Big Data Inc Series? Easy. It is a series of bite size posts that explain enterprise search and Big Data.
What is my objective? At a high level, each post will take between 5 to 7 minutes, and will provide an overview of one particular topic – and only one – to give you enough information to understand what is the purpose of a particular platform, language, project or anything else that touches enterprise search and Big Data
Why am I doing this? First of all, I am really passionate about search and Big Data… like a kid on Christmas day. I do have to agree that I have my preferred platforms, languages, and projects. However, it does not hurt to have an idea of what each one is about.
Also, why are the posts so short? Well, I could go on and on for hours – believe me, or at least my friends who say that a 45 minute presentation for me is just like warming up – but the point is that I want to be very concise, straight to the point, and give you an overall idea. The Big Data Series is not meant to be tutorials. For trainings I have several courses at Pluralsight which include topics like Spark, Cloudera CDH, Solr, Hue, Hive, JSON, code profiling and more – as well as having done and helped on trainings for Cloudera, Microsoft/HP/Intel.
I will cover a topic, give you a general idea, and let you decide if this is a technology that could be useful in your toolbelt. In many cases, I will point you in the direction of where to go learn more or I will tell you a story or two of how these technologies are used in real life.
So please join me on this journey with the Big Data Series. In our next post, we will talk about how Big Data started, with Hadoop. Also don’t forget to subscribe to be notified of new released posts, videos, like and share. Also, you can follow the links below in the description.
If you prefer to take an instructor-led training, Cloudera has a great training, with amazing instructors to teach you Solr. If you were not aware, Cloudera Search is actually Solr but running on top of a Hadoop cluster. So hello Big Data!