Stock Market and Unemployment: Trump vs. Obama According to Artificial Intelligence (Machine Learning)

by Xavier Comments: 0

Disclaimer

Let me start off by saying this: this post is not intended to be a political statement. 

It is not my intention to start a red vs. blue thing. I simply want to look at the data and find out if the claims of “biggest economic growth in history” from Trump stand from the perspective of Artificial Intelligence (AI), using several different Machine Learning algorithms.

The Statement and the Methodology

The last 3 and change years the most repeated phrase has been “biggest economic growth ever” or something of that sort from 45.

But… does this stand true?

Lucky for us, the data is public. It is possible to check the unemployment records in the US as well as the stock market. 

There’s even plenty of charts that try to show the evolution of both unemployment and the stock market in both the Obama and Trump eras.

But, has the US been “on a roll and recovery” during Trump or is he just reaping the benefits of what Obama did?

There are plenty of articles and serious publications that try to show this, including:

US 2020 election: The economy under Trump in six charts from the BBC

– Trump boasts the US economy is the best it’s ever been under his watch. Here are 9 charts showing how it compares to the Obama and Bush presidencies from Business Insider

– The Trump vs. Obama economy — in 16 charts from Washington Post

But what if we take a different approach?

 

A Different Approach with Machine Learning (AI)

Let’s take the data for both unemployment and the stock market and use the Obama era as our training set and the Trump era as our testing set.

That way, we predict if Trump actually helped improve the economy in terms of creating jobs and making the stock market grow.

Here’s a repository with the unemployment and stock market data. Go ahead, create your own Machine Learning model to determine if the US grew economically because of Trump or simply continued from what Obama created.

https://github.com/bigdataincorg/stocks_employment 

If you create a model, go ahead and share your results. Create a pull-request if you want.

Unemployment rate

Obama received an unemployment rate of close to 10% when he started his term, in large part because of the 2008 crisis—the great recession as how some call it. 

In 2016 when he handed the keys to the White House to Trump, the unemployment rate was a tad over 4%. 

The trend for unemployment kept for the next few years, until it got to about 3.5%—that’s pre-covid.

If I ignore Covid and simply use Obama as my training data, then I am able to predict unemployment for the next few years until the pandemic began.

 

The Machine Learning model predicted quite well, but as you can see, unemployment has actually declined not that much since he took office. 

Maybe that last part is a bit harder, I can agree on that, but my point is that it looks like there was no real change, the trend just stayed the same.

 


Referring to the stock market´s pre-covid-19 record highs, the wealthiest ten percent of the shareholders own more than ninety percent of all stocks. And the 2017 tax cut was only beneficial toward corporations and the rich. With middle class income households getting a tax cut of about nine hundred thirty dollars and the top one percent really enjoying a cut of more than fifty thousand dollars. 

The unemployment rate pre-covid-19 was indeed at a half century low but this rate has been falling steadily since 2011, so we can’t really see a difference since he took office. Even though 6.6 million jobs were created under the Trump administration, we shouldn’t forget that Obama inherited an economy during the worst financial crisis since the Great Depression. Therefore, job creation under the Trump administration is merely a continuation of an improving job market and can´t be compared to the turnaround in the early years of the Obama administration. 

Monthly job growth was higher under Obama than in the first two years after Trump took office. Obama added over 1.6 million more jobs in his last three years in office compared to Trump’s first three years. 

Stock Market and Unemployment: Trump vs. Obama According to Artificial Intelligence (Machine Learning)

Regarding the stock market, here is a chart below. In blue we can see the Obama years. The stock market more than doubled during his time. 

It did grow too with Trump; I hear some people say that it is all because of how Obama provided a stable and drama free environment for companies to grow and thrive while other people say that the economy boomed (although I may say “kept booming”) because of Trump’s policies.

Well, my Machine Learning model basically thinks that the stock market kept growing just like the momentum that you gain when someone pushes you on a bike… you keep going because you got the right push.


It is possible that there are two reasons as to why the stock markets have risen. The tax cuts and the federal reserve that is keeping the interest rates low and flooding the markets with cash. And how did the stock market perform under Trump’s leadership when compared to Obama´s?  We are doing our best to keep this a data driven piece and not turn it into a political viewpoint.

And we want to point out again that no one can claim responsibility over the stock market, only slight influences can be observed. 

We can observe a steady climb through most of Trump’s first two years. But the market still did better with Obama as president for his first three years. 

Even though as mentioned frequently before that it is not valid to attribute stock market performance to a person. We can track data over longer time periods and see that US stock performs better under democrats. 

(Forbes reports. From March 4, 1929 through July 5, 2016, U.S. stocks returned an average of 1.71% under Republican administrations and 10.83% under Democratic administrations. While an updated analysis would have the gap narrowing, it would still be significant.)

In essence the stock market was thriving under both presidents. Leaving us to think that capitalism acts regardless of who is politically in charge. 

The stock market was up 46% with Obama compared to the 25% under Trump. Obama ended his presidency with one of the best gains of any president in modern history. When Trump started and introduced his tax cut, stocks were high but they have declined progressively since he started his trade war. 

His 2019 stock market gains are still minimal compared to his predecessors. Stocks grew faster after the reelection of both Clinton and Obama. A 28.6% growth during Trump’s third year fell pale compared with Obama´s 32% when recovering from a financial crisis.

"Economic Boom"

Trump likes to credit himself with bringing about an economic boom the likes of which the world has never seen before. He feels he has launched the great American comeback. But in reality, he didn’t really attain his goal of raising the economy’s growth rate to four percent. There was only a small increase in G.D.P. growth. His tax cut did create growth in 2018 but the effects were severely pushed back by the trade war. We could argue that economic growth did perform slightly better than under Obama but not in comparison with all his recent predecessors. 

Trump especially uses GDP as an example of his success and a major reason for his reelection. It is worthwhile to mention that presidents can’t really take credit for the state of the economy. There are many factors that have an impact on growth that have nothing to do with their policies. 

To be totally accurate, Trump started his presidency in a steady economy, unlike Obama´s time period that was dealing with a serious recession. Trump frequently points out that the US economy is the best it has ever been. This is not the case if we take into account the state of wage growth or business investment. Looking at the GDP growth under Trump, it doesn’t reach his promised 3% mark annually. It is true that the economy has improved under Trump but the recovery began under Obama. 

Again there are a multitude of factors to consider when measuring the state of the economy, therefore it is not actually valid to ascribe the situation to one president. Here’s how the Trump administration’s economic accomplishments actually compare to Obama’s.

If we do want to compare Trump’s GDP growth to Obama´s over a period of three years and look at the numbers it is safe to say that in Trump’s case it is actually slower. Obama´s last three years showed more growth compared to Trump´s. Claiming that he has built the greatest ever US economy before the coronavirus outbreak is not exactly true either. It was doing well but again this started during the Obama administration and also there were periods when it was a lot stronger. The annual average growth was roughly similar during both presidencies. 

Pointing to the tax cuts Trump introduced, they were beneficial to economic growth for about a year but didn’t pay for themselves and created a Federal budget deficit of $1 trillion, something that has never occurred in a non-recession situation. Actually, GDP growth was higher on average under Obama in 2014 and 2015 than compared to Trump in 2017 and 2018. 

Tax cuts

“We lowered our business tax from the highest in the developed world down to one that’s not only competitive, but one of the lower taxes.”
Donald Trump

There are no signs that capital spending and wages are increasing because of the tax cut. The tax cut only boosted the net worth of CEO s and stockholders. And left a debt of about $2.9 trillion. The tax cut did not meet the goal of more investment in new equipment and factories. There was a slight increase in business spending in 2018 but have since declined heavily mainly because of the trade war.

Federal debt

During Obama´s presidency the national debt swelled when trying to rebuild the economy after the financial crisis. But at the end of his term this deficit had significantly declined. Because of Trump’s tax cut and an increase in government spending the annual deficit has considerably gone up again.

Conclusion

We thought it would be interesting to look at Trump’s economic growth claims with a fairly neutral data driven approach. We see that there hasn’t been a significant growth during Trump’s precovid years and that stock markets function fairly independent of who is politically in charge. We were able to lessen the validity of some of Trump’s claims and point out the fallacies in many of his tenures. The data indicate that there were no significant increases and that his wins are merely a continuation of what was put into action before he took office.

To conclude, it is safe to say that the pre-pandemic growth has only followed the trend and the economy wasn’t exactly booming like it had never boomed before, at least that’s what my Machine Learning model says.

A big thank you to Viva Lancsweert and Humberto Barrantes for helping me on researching this topic and creating the nice charts.

The History of Everything Around Big Data

by Xavier Comments: 0

The History of Everything Around Big Data

The tech world changes fast… really fast.

It seems like every time you blink, there is a new framework that gets created or a new language comes along.

In some cases, you can just ignore all these new shiny things… but maybe, just maybe this new framework, language, or service can help make your life easier.

But how do you stay up to date?

That’s where I come in. I will be posting several articles where I go deeper into the world of tech, with a primary focus around everything Big Data.

Some fo the topics that I will cover include getting to know which are the leading Big Data products, their origins, how and when to use them and why do they matter?

And if you are tight on time, then I have other good news for new. Each one of these posts will come with a video so that you can hear about a particular topic while you are at the gym, commuting, or perhaps need something to put you to sleep.

Here’s the list of what we have published and what’s coming in the near future:

Welcome to Big Data TV – Or The One That Started It All 

This is just the intro post, which tells you a bit more of what I am going to be covering next.

Check out the post here or the video here

Here’s what’s coming next:

The Story of Hadoop and Why Should I Care?

by Xavier Comments: 0

You might have heard or seen the term Big Data. The term refers to data sets that are too large or complex to be dealt with through traditional processing applications.

In fact, the information within these data packets is so enormous it can’t be stored or processed on one server. Instead, it might take calls to several devices to retrieve the data. Even then, process time can still be incredibly slow.

Distributed Computing

This is where Hadoop comes in. Developed in 2005 by a pair of Apache software engineers, the platform creates a distributed model to store large data sets within computer clusters. In turn, these clusters work together to execute programs and handle potential issues.

So, how did we get to this point in the world of digital information? Did it appear without notice, or did the concept of large data sets gradually form?

Let’s get into some history on the creation of Big Data and its connections with Hadoop.

Beyond The Information Age

The concept of Big Data goes beyond the Information Age. Individuals and groups have dealt with large amounts of information for centuries.

For instance, John Graunt had to deal with volumes of information during the Bubonic Plague of the 17th century. When he compiled the data into logical groups, he created a set of statistics. Graunt eventually became known as the father of demographics.
Issues with large data occurred after this, as did the development of solutions. In 1881, Herman Hollerith created a tabulating machine that used punch cards to calculate the 1880 Census. In 1927, Fritz Pfleumer invented a procedure to store data on a strip of magnetic tape.
As more data was collected, the means to store and sort it changed. There wasn’t any choice as the information became increasingly complicated. For example, the amount of calculations required by NASA and other space agencies to launch successful programs.
Move Into Popular Culture

However, this didn’t match the accumulation of data collected once computers were made available to the public. It reached enormous sizes when those users learned about the internet. Add smart devices, artificial intelligence, and the Internet of Things (IoT), and “Big” has become exponentially huge.

Consider what is part of this label. Social media is a large piece of it. Credit card companies and other groups that handle Personally Identifiable Information (PII) also produce large amounts of information. Banks and other financial firms create well beyond trillions of data bytes in a single hour.

The Official Term

It wasn’t until 2005 that this process was given the name we know today. It was coined in 2005 by Roger Mougalas, a director of market research at O’Reilly Media. At that time, he referred to it as a set information that was nearly impossible to process with traditional business tools. That includes Relational Database Management Systems (RDBMS) like Oracle.

What could a business or government entity do at that point? Even without excessive information from mobile devices, there was still a large volume of data to compile and analyze. This is where two Apache designers — Doug Cutting and Mike Cafarella — came into play.

Computer Clusters And Large Data

In 2002, these engineers started work on the Apache Nutch product. Their goal was to build a new search engine that could quickly index one billion pages of information. After extensive research, it was determined the creation of Nutch would be too expensive. So, the developers went back to the drawing board.

Over the next two years, the team studied potential resolutions. They discovered two technological white papers that helped. One was on the Google File System (GFS) and the other was on MapReduce. Both discussed ways to handle large data sets as well as index them to avoid slowdowns.

This is when Cutting and Cafarella decided to utilize these two principles and create an open source product that would help everyone index these large data amounts. In 2005, they created the first edition of the product, then realized it needed to be established on computer clusters to properly work. A year later, Cutting moved the Nutch product to Yahoo.

It’s here he got to work. Cutting removed the distributed computing parts of Nutch to create the framework for Hadoop. He got the name from a toy elephant his son owned.

With GFS and MapReduce, cutting created the open source platform to operate on thousands of computer nodes. In 2007, it was successfully tested on 1000 nodes. In 2011, the software was able to sort a Petabyte of data in 17 hours. This is equal to 1000 Terabytes of material. The product became available to everyone that same year.

Of course, this is not the end to the story of solutions needed for the index of large data. Technology continues to change, especially if outside influences make more of us head to our computers. There will come a time when something more powerful will be required than multiple storage nodes.

Until then, we thank those who have already gone through the steps to help all of us retrieve large amounts of data in the quickest and most efficient way possible.

Welcome to Big Data TV – Or The One That Started It All

by Xavier Comments: 0

Hello and welcome, I am Xavier Morera and I am very passionate about helping developers understand enterprise search and Big Data.

And today, I welcome you to the first post of the Big Data Inc Series (which will soon be joined with Big Data TV).

So, you might be wondering… what is the Big Data Inc Series?  Easy. It is a series of bite size posts that explain enterprise search and Big Data.

What is my objective? At a high level, each post will take between 5 to 7 minutes, and will provide an overview of one particular topic – and only one – to give you enough information to understand what is the purpose of a particular platform, language, project or anything else that touches enterprise search and Big Data

Why am I doing this? First of all, I am really passionate about search and Big Data… like a kid on Christmas day. I do have to agree that I have my preferred platforms, languages, and projects.  However, it does not hurt to have an idea of what each one is about.

Also, why are the posts so short? Well, I could go on and on for hours – believe me, or at least my friends who say that a 45 minute presentation for me is just like warming up – but the point is that I want to be very concise, straight to the point, and give you an overall idea. The Big Data Series is not meant to be tutorials. For trainings I have several courses at Pluralsight which include topics like Spark, Cloudera CDH, Solr, Hue, Hive, JSON, code profiling and more – as well as having done and helped on trainings for Cloudera, Microsoft/HP/Intel.

I will cover a topic, give you a general idea, and let you decide if this is a technology that could be useful in your toolbelt. In many cases, I will point you in the direction of where to go learn more or I will tell you a story or two of how these technologies are used in real life.

So please join me on this journey with the Big Data Series. In our next post, we will talk about how Big Data started, with Hadoop. Also don’t forget to subscribe to be notified of new released posts, videos, like and share. Also, you can follow the links below in the description.

And as we Costa Ricans say, pura vida!

 

A Few Resources to Get Started with Search and Big Data

by Xavier Comments: 0

The other day I saw a question on where to start to learn Big Data. Well, it dawned on me that I have created a few resources that might be useful and so I share them here. It feels good to have a few resources that can help people get started.

If you want to set up Hadoop clusters using Cloudera you could watch these online trainings:

Creating Your First Big Data Hadoop Cluster Using Cloudera CDH

Preparing a Production Hadoop Cluster with Cloudera: Databases

Deploying Hadoop with Cloudera CDH to AWS

Deploying and Scaling Cloudera Enterprise on Microsoft Azure (this one is FREE)

They get you started with a development cluster, then a production grade cluster, then a deployment in the AWS cloud and then on Azure, including a module on managed Big Data with Cloudera Altus

Once you have a cluster, you can watch this course to use HUE to work with Hive, Pig, Impala and more.

Take Control of Your Big Data with HUE in Cloudera CDH

If you want to learn about search engines, you can check these on Solr

Getting Started with Enterprise Search Using Apache Solr

Implementing Search in .NET Applications

And regarding Spark, which IMHO is one of the best platforms that you can learn now then you can take either of these courses, which help you get started with either Python or Scala

Developing Spark Applications with Python & Cloudera

Developing Spark Applications Using Scala & Cloudera

I hope this helps. IMHO, learning Big Data is one of the best moves that you can make at the moment.

Next Conference: Pluralsight LIVE 2018

by Xavier Comments: 0

And so it is time to get ready to my next conference, Pluralsight Live which will take place in August 28-30 in Salt Lake City. I will be presenting on how to deploy Cloudera clusters on Microsoft Azure. Hope to see you there!

Use discount code 3061 to register at  https://www.pluralsight.com/event-details/2018/live-2018/registration

 

The Art of Creating Applications That Have Search

by Xavier Comments: 0

In my Pluralsight trainings, Getting Started with Enterprise Search using Apache Solr and Implementing Search in .NET Applications, one of the things that I make quite a bit of emphasis is on how important search is, yet it is one of the most misunderstood functions of IT and development in general. In this post I will show you an example of how a potentially good app is a pretty bad app mainly because of its search capabilities.

It is so much the case that in Twitter Pluralsight selected this phrase to tweet about the release of my course as you can see here:

searchiseverywhere

But now let’s get to the sample. Here’s the scenario:

Problem: Life is busy. No time to go to the supermarket

Solution: use your grocery store’s web site to purchase your food and it gets delivered home the next day. Charming idea, did not work with Webvan, but it seems to be doing quite well for Amazon and in my home town one of the major supermarkets is doing it in a more controlled way with a good delivery service, all for $10. Not too scalable, but for a MVP it is ok. (Read Lean Startup if you don’t know what MVP is)

It may work or maybe not mainly because of a really bad user experience, but let me get to the point. UX is important! Never forget it!

You get to the app in https://www.automercado.co.cr/aam/showMain.do and they have mainly 4 sections as you can see here

auto

And here is what they are for:
– On the left they have a directory style organized by aisle. Grouping kind of works in my opinion if you are not too sure of what you want, but it is terribly slow and inefficient. They lose cookie points for this.

2014-07-02_0638

– Then in the middle they have a section where they display the products. This is very standard so it kind of goes through, however they lose cookie points again for having products without pictures or with very weird stretching. They are a supermarket, and a big one, so I am sure they can send a guy with an iPhone to take a quick picture.

2014-07-02_0637

– The cart has a problem which is that they do not actually display the product name, only the description. Who thought of this? Not even something as simple as a tooltip!

2014-07-02_0640

And then here is the deal breaker for me: BAD SEARCH! As mentioned in the post, search is one of the most misunderstood functionalities in IT. A lot of people make huge mistakes because search can be done with a database, which it can, but the end results sucks! And it did suck here.

Let me show you this. I want to look for “jabon dial” which means “Dial Soap”. So I just type “Jabon Dial”. Should work, right? It doesn’t! Look at the message: “No results found…”. Also I hate the CAPS. There may be 1 technical reason I can think of but it is pretty dumb.

2014-07-02_0646

But why? If you look closely there are 27 types of “Jabon Dial”, type only Dial

2014-07-02_0649

The problem lies here:
– The person that implemented this application had no knowledge of how search works, which is normal as search is pretty misunderstood.
– But humans don’t do search like engineers want. Having the user do a search exactly like the engineer wants is just lazy and ineffective.
– So engineers who created this probably went for a simple exact match in a database search
– This is a terrible user experience. I can bet the farm that Amazon would have closed its doors in the 1990s if they had such a bad search

How to fix it? Well, go learn how to use a search engine. And that’s why I created my course, Getting Started With Enterprise Search Using Apache Solr: http://www.pluralsight.com/training/Courses/TableOfContents/enterprise-search-using-apache-solr

The Day We Started to Outgrow Relational Databases

by Xavier Comments: 0

Look around you. Look closer. Pay more attention.  What do you see?
When I look around me I can see activity trackers, digital cameras, smart watches, interconnected devices, virtual reality gadgets, wearable technology, smart elevators, energy saving light systems, intelligent traffic lights, smart cars with over the air updates and that can gather data on your driving habits, intelligent houses, eco friendly buildings and more.
All of these generate massive amounts of data.  But let’s hold that thought for a minute.
Now this is just what’s happening around you.  What do you have in your pocket or in your hand right now?
Most likely a smart phone. It is your portal to the digital world and even though it became second nature – pretty much everyone walks around with a phone in their hand now a days – it is a relatively new phenonem.
It is highly likely that you use your smartphone constantly to check Facebook, Twitter, Instagram or search the web using Google among a few other applications. This generates humongous amounts of data.
Let’s throw out a few numbers just to put it in perspective. Facebook has 1.6 billion users- yes, that is with a B – millions of them who log in every day to upload millions of pictures, add comments, like posts and perform many actions. And every action has an impact but as the amount of data grows, it gets harder to determine what that impact needs to be.
Then we have Twitter, which may be a tad smaller albeit still plenty of data by any definition. But the beauty of Twitter is not just the human interactions, but instead what can be extracted from the data.
And Google… well… what can I say? Try indexing the internet and then we can talk about it. Just do it, invite me over and I will buy you a coffee while you tell me how it went.
How much data do you think is generated daily by all these applications? But besides the applications, remember how I mentioned many devices that also create mountains of data? This means that besides human generated data, we also have machine generated data.
A Big Data World
The world that has changed in unimaginable ways. Together, human and machine generated data, bring us into an information explosion era the likes of which the world has never seen. You are living a digital revolution and you can consider yourself lucky for being part of it.
And tweets, posts, likes, pictures and stats are very nice. But that is just the tip of the iceberg of what’s to come.
There are many other applications that require analyzing those massive amounts of data to help reduce costs, detect fraud, and many other potential use cases that help drive innovation for all mankind. All these are scenarios that help increase profits, decrease costs, innovate or help stop the bad guys are nice. But let’s take it up one notch.
There are some people trying to make a difference like hospitals that are working to cure cancer by analyzing DNA records, comparing them and find ways to save human lives. Imagine if one of those lives they saved was your son, your daughter, your wife or your parents.
The world has changed around us. We now live in a world of Big Data.
Getting Insights
But data by itself is just data and as I mentioned something needs to be done with this data to get insights.  How can this be achieved?
Well, let’s first rewind a few years and think how this was done before. A while back, if you had a “massive” amount of data you went to your prefered vendor and wrote them a huge check for an equivalent machine. Then you wrote another very big check to your favorite database provider to run in this machine and built an application that could consume the data, process it and give you the answers you needed. Also, you usually had to limit the amount of data as you were constrained by the limits of your big box so you had to throw data away.
Outgrowing Relational Databases
But there were times when a big box was not enough. For example, what if you wanted to index the entire internet? There was no box big enough for this.
Also, there were a lot more scalability constraints like performance. If you had 1 terabyte and it took 1 hour to process and then you add another terabyte, well it will probably take almost twice the amount of time or even more.
And this was a problem that needed a whole new way to be solved. It all happened when Google published a paper circa the early 2000s where they explained how they invented a way to solve this problem using GFS and MapReduce.
And then magic happened. Two Yahoo employees, Doug Cutting and Mike Cafarella, read the paper and they had to solve the same problem so they got inspired and created Hadoop!
Welcome to Distributed Computing at Its Best
Hadoop took a different approach to solve any Big Data problem. Instead of a big box, it relied on creating clusters of many smaller and way cheaper computers, also known as commodity hardware.
The data is then distributed among all these computers and then processed locally. In hindsight it sounds so common sense, but instead of taking the data to where it is processed, it takes the computation to the data. Each individual node in the cluster has its own copy of the data and does all the computation locally.
Agreed, a big box is probably more reliable than a bunch of commodity servers, so probably a few of them might fail during processing. This is not a problem as Hadoop has data redundancy so if one server fails, there is a copy of the data replicated in another server that can pick up the work.
And so we have distributed, resilient, dependable and efficient Big Data systems that are helping us change the world.
Data drives the modern world. But who drives the data? That is what Cloudera is here for.
Ask Bigger Questions!

Big Data… Big Deal? Big Hype?

by Xavier Comments: 0

Big Data

Big deal?

Big hype?

Or big change in our world?

I think that the answer can be all of the above. “Hype” you might be thinking? Well, here is the deal. Our world has changed in unimaginable ways. The amount of information created daily is reaching levels that just a few years ago would’ve been considered science fiction or even plain old crazy.

Lots and Lots of Data
To make it even more interesting, a lot of it is unstructured data. Which can be kind of a problem if we think about it, because the success of relational databases has taught a lot of us to think in a columnar and relational way.

And this is not bad… at all. It is nice to have all your data and metadata organized neatly. You can use select, join, where, group by and more to get what you need.

But the success of relational databases can also create a blind spot for many. Just a few days ago I was talking with the VP and cofounder of a company related to migrations and artificial intelligence software whose company has faced success (as well as a few failures or learning experiences) in several world class projects. They had lots of data that they obtain from their automated code conversion tools and what are they doing? They are normalizing it into a database.

I don’t think it is a bad approach, however it is not the one that I would take. Long story short, I would store the logs as is in their raw format and then use any of the available projects to analyse it in multiple ways, looking for key points, failures, trends and more. But what you do with the data is the topic of another post or a Pluralsight training. Let’s go back to our main point.

Mountains of data is being generated daily and the amount will just continue to -grow- explode.

Unstructured Data “Just Happens” 
If you had to structure all your data, do you imagine what the cost would be? Just go ask your manager for an Oracle system and some servers to process all of your web server logs to put them in tables. The cost would be exorbitant.

And beside cost, sometimes you may not know the structure of your data. And that is one of the beautiful parts of Big Data. You can just store your logs in raw format and later come back and do your work, modelling your data in different ways. And what if you have too much data and the process is taking longer than expected?

Well, just add a few more servers and get the job done in parallel. Hadoop runs in commodity hardware, thus you can get many relatively inexpensive machines to work together and process your data according to your needs.

The Cloud and the Bar
And even better, remember “the cloud”! A few years ago if you were a startup and needed beefy power, you would need a lot of upfront cash to cover expenses. Now with AWS and Azure we have the possibility of turning a few virtual machines, get a cluster up and running, crunch the data, get the result, turn them off and only pay for the time you use.

And this change has lowered the entry bar for innovation. Now many brilliant ideas can be tested or theories can be analysed at a much lower cost, benefiting all man kind. For example, it is possible to run analysis on medical treatments to help cure cancer or many other diseases. Sometimes answers to hard questions lie right there in the data, they just need to be discovered.

Hype or Go Figure This Hadoop Thing Out
But what about hype? Let me make this clear, I don’t think Big Data is hype. I do think that there is a lot of hype around it and even though we are able to do great things with Big Data, the greater public does not yet fully understand what can be done and how so I have taken a personal mission to help developers and the public in general understand Big Data (and Search)

So then it is time to ask ourselves this question:

What Are My Choices for Getting Started with Big Data?

SolrNet vs. SolrExpress

by Xavier Comments: 1

I have been working with Solr for a while, mainly from the .NET world and I basically love it. I use SolrNet which I think it is a very mature and stable library. I was asked today if I have ever used SolrExpress and if I recommend it over SolrNet.

The short answer is no, I have not used it. Therefore I can’t give a facts based recommendation, but looking over the source code of both libraries it is my opinion that SolrNet is still more complete. So I still believe SolrNet to be a more sensible choice.

It is worth mentioning that is a biased point of view,  as I have used SolrNet multiple times and it really has made my life a lot easier.

Having said that, besides using it several times, I have authored a few things around Solr and SolrNet and used it extensively. It works fine and I know it pretty well. It basically gets the job done, it is pretty mature and almost complete (pending SolrCloud and a few minor things like a breaking change on collation).

Some of the things I created

I created a Solr training for Pluralsight

2016-08-18_1009

https://www.pluralsight.com/courses/enterprise-search-using-apache-solr

Getting Started with Enterprise Search Using Apache Solr …
www.pluralsight.com
Search is one of the most misunderstood functionalities in the IT industry. Apache Solr brings high quality Enterprise Search to the masses.

And a SolrNet training for Pluralsight

https://www.pluralsight.com/courses/implementing-search-dotnet-applications

2016-08-18_1005

I wrote a book for a company called SyncFusion for their Succinctly Series for Solr and SolrNet

https://www.syncfusion.com/resources/techportal/ebooks/apachesolr

2016-08-18_1002

I’ve also done internal trainings, presentations and webcasts on Solr + SolrNet
http://www.meetup.com/Atlanta-Net-User-Group/events/222161640/

2016-08-18_1011

Learn How to Add Search to .NET with Solr & SolrNet …
www.meetup.com
Search is a functionality that most people take for granted while at the same time it is deeply misunderstood and usually poorly implemented. .NET

SolrNet does not have yet support for SolrCloud in the main repository, but there is one fork that already uses it but our current project does not use forks, only the main repository. If that is not a blocker for your customer, go ahead or like in our case, just use a load balancer for querying and a call to zookeeper api to get leader for indexing.

Hope this helps.