Harry's Tech Space

Read my experience with different products and technologies.

Category: Analytics

Articles chronicling my journey into analytics.

Scientific Research Process: what it is?

Scientific Research Process
(Source: flickr)

Many times traders used to tell their process of setting up the trade as scientific and blah blah blah. But to call something as scientific, every technique has to go through a proper scientific research process. It also has to stand its ground to rigorous review by a tough human being and also the process is subjected to tough acid bath of statistics. The scientific research process unlike an algorithm of computers is a linear one. Here is steps to follow in scientific research process.

Steps in Scientific Research Process:

  1. Problem Definition
  2. Review of previous works related to the problem.
  3. Setting up of research apparatus and Formulation of Hypothesis..
  4. Running the research and collecting the data.
  5. Analysis of Data collected and validation of hypothesis.
  6. Presentation of the result and expert review.

The above list is how a research happens. These steps can also be seen in different mythological context too. The 6 steps corresponds to What How and Why’s of a thing (Also read: what how and why framework here.).

Step 1: Problem Definition:

The first step in research is definition of the problem. The problem is nothing but an telling of whats really happening in the environment. Problem definition can be like “Why sales of sunfeast biscuits is so low in city of Dharwad?”. The problem can also be “how to measure oversoldness of a stock” etc. In short whatever the so call pundits call as scientific can be considered as problem definition. The key criteria of good problem definition is the problem should entirely based on subject and should not have any reference to statistics.

Step 2: Review of Previous Works:

The review of previous works sounds more odd as many feel that their works is unique and will not be covered by previous researches. When a review is done it sheds light on variables and the interplay between them which can also have bearings on your research. Also the review sheds light on biases that may creep in. If you haven’t read on efficient market hypothesis and started off with your research on trading techniques, its expected to be one sure fire biased work. Its the review works which can differentiate a good researcher from a bad one. Review also helps in understanding the subject properly, and help in forming hypothesis (its obtained by subdividing the problem statement into small measurable chunks).

Step 3: Setting Up and Hypothesis formulation:

In this phase we set up the research process apparatus i.e. deciding on things to measure, how to measure etc…

First step of it – the problem statement is subdivided into chunks called hypothesis. The hypothesis is a set of  2 statements which describe small set of problem . One of the hypothesis is called as Null Hypothesis and other one is called Alternate Hypothesis. Null hypothesis states that there is no relation between the variables, where as alternate states there is relation between the variables.  And every hypothesis is made of variables which are measured to know the validity of the which of the hypothesis holds true. The main feature of hypothesis is that only one of them can be true.

Apart from preparing hypothesis you are also supposed to identify the proper audience also called as sample. (sample is the set of people on whom the research is done, Sample is selected from population hence its subset of population.) The process of selecting sample is called sampling and is decided in this step itself.

You must also decide how go about collecting data without letting bias to creep in. You are segment sample to allow pure randomness, so that biases due to concentration doesn’t creep in while collecting data. Bias can also question the validity of research hence one needs to be careful about it. You can call this step of sampling, and prep work as preparation of statistical model, as it lays out model for data collection. Only after proper model is setup one can go about data collection.
(Know more on statistical hypothesis testing on wikipedia, it explains about model too)

Step 4: Data Collection:

Data collection is the smallest step of the research process. Here the research is conducted and data gathered. In case of research related humanities subject often times the questionnaire is submitted to people to answer. There are other types of experiments to collect data like focus groups, blind tests etc. for human participants. For non humanities subjects like physics etc, the research is ran on machines and data is captured. The machine on which experiment is conducted is part of experimental set up. The data collected from it is called as sample. The data analysis step follows the data collection.

Step 5: Data Analysis:

This is the step where acid bath of statistics happens. This is one of longest and important step of scientific research process. Here the statistical analysis of codified data happens. Based on the statistical analysis the results are published.

Once the data is collated from questionnaire, it needs to cleaned and be made machine ready. For example if questionnaire had rating scales then the answer would be like strongly agree to strongly disagree. These kind of likert scale answers  cannot be fed into machines directly hence they need to codified like strongly agree = 5 and strongly disagree = 1. (One has to note that the coding strategy is pre-decided in research set up phase, here only its implemented based on it.)

After the coding and data entry is over Software like Matlab, SPSS, SAS are are used by researchers to run the various statistical analysis on the data. Things regression testing, factor analysis are done in this phase to validate hypothesis. The analysis is called as hypothesis testing, since these analyses are done to validate which hypothesis is true. Based the result spewed out by the software which hypothesis is valid is determined.

Step 6: Results & Review:

The final step is of publishing the results and its subsequent review by an expert. Based on the experiment conducted the valid hypothesis is collated and result is published. If one can recall, the data collection of gravity waves ended way back in September of 2015 still it took a lot of time to run the analysis and publish the results. Once the results are published it has to go through a panel which vets whether the research was done in unbiased way, once its fairly confident of absence of biases then the research is published.

There are some caveats in this scientific research process. In the above mentioned scientific research process, the researcher does the experiment to confirm his gut feeling. For example the discovery of gravitational waves was to confirm whether the gravitational waves exist or not.This kind of confirmation of gut feeling is called as confirmation research. There also another branch of research called as exploratory research, it follows a different scientific research process but instead of hypothesis testing it just measures the variables and tries to build relation ship between the variables. This research falls under the realm of big data.


The above said steps describe the scientific research process. All the statisticians in a company follow the above mentioned steps when they do their market research or other kinds of R&D works in a company. To know more the above mentioned scientific research process in depth you can read a book by Thomas Davenport called as “Keeping Up With Quants”.
Here is amazon links to its hardcover and kindle editions.

If you are trader then remember that trading is art, its not science. Its so because the first rule of trading is “Be Flexible” and science is never flexible. Since the above steps are very laborious, dont call every one of your gut feeling as scientific, you can call your project in MBA as scientific though. ;).


The shenanigans of Context in the stats

WARNING: Your brain may melt trying to understand the shenanigans of Context in already difficult field of stats. So here is my article to understand context & statistical analysis.

Most of my shenanigans with analytics happened after I joined SKDRDP. Since I learned analytics by following blogs like KISSMetrics, Lunametrics, Avinash Kaushik etc.. I was filled with preconceived notions like Bounce rate is a metric of badness, Try to keep visitors on site to stick longer etc.. Despite reading them for most part of my MBA, I couldn’t answer what and why’s of these metrics. These preconceived notions ultimately led to my undoing as I knew only the numbers, not the environment in which they operate. Hence whenever you want to interpret numbers understand Environment. To understand the dynamics of  Environment I had to resort to books of Devdutt Pattanaik (Business Sutra : A Very Indian Approach to Management). It had metaphor of Lakhsmi which shed light on “Why What How” framework.

What How Why manifests in nature various ways like Lakshmi, Durga, Saraswati or as Problem Statement, Research Methodology, Research Findings. Also other manifestations of it are Knowledge, Skill, Experience. Without getting confused lets dive into the framework.

Knowing the Context: “What?”

The question of “what?” pertains to our vision. Here the question to be asked is “What’s Happening?”. To understand what’s happening you have to see it. So the primary skill required here is being unbiased while “Seeing” the ground realities. In short understand “What’s happening?” but don’t use your thinking and analysis skills here.

There are various manifestations of the question “what?”. Below are some:

  • “What I am trying to achieve?” here its called as Problem Statement.
  • “What is happening around me?” here its called as Observation.
  • “What’s the fund supposed to achieve?” here its called Fund’s Goal.
  • “What does this number mean?” here its called as Knowledge.

The question of “what?” normally pertains to events that happened in particular point in time. It only reveals your understanding of the events.

Dynamics of the Context: “How?”

The question of “How?” pertains to our inquisition skill. Here the question to be asked is “How did it happen?”. To understand how the thing happened you have to see the entire narrative or scene. Here the “How?” pertains to  the process being undertaken. In short understand “How the things are shaping up or how it happened?” but don’t try to infer or jump to conclusion.

Here are some of the How’s:

  • “How will I reach my goal?” this is called as Plan.
  • “How did it happen?” this is called as Process Explanation.
  • “How will the experiment be conducted?” this is called as Research Methodology

The question of “How?” tells about events that happened. This is about things happening over the course of time. The biases of methodology creeps in during this stage, hence be aware of the limits and biases.

Playing the Context: “Why?”

The question of “Why?” pertains to our understanding. Here the question to be asked is “Why did it happen?”. Here its necessary to apply thinking power of us. One can use this only after you have answered the “What?” and “How?”.  Hence its important to get the event and process correct before diving into understanding of event.

When the what?, how? and why? are combined together we call it as research. The research is considered important because it sheds light wast ocean of context and tries to explain it definitively. As the researches pile up on one another it creates a body of knowledge.

Decision Making and Google Analytics

Decision making is the most often done thing in the field of websites, be it trivial decision of  ‘which colour to render the website and its elements?’ or important ones like ‘how to target the site specific audience?’. If this decision making is entirely gut based then “God Bless You”, you are heading towards a minefield. If the decision making is based on validating your gut feeling with data, then you are actually walking path of Statistical Analysis. Here tools like Google Analytics plays a vital role.

There are lot of analytics offerings and each analytics offerings differ on metrics they provide. Since I started out as blogger on Google’s blogspot, I was more accustomed to Google Analytics. Hence within 10 days of my joining I had installed Google Analytics on each and every page of Site. I installed Google Analytics on site and started gathering data from 21st of March 2014. As the data started pouring in I had power to make decisions on which elements of site to keep and which elements to eliminate. This power I obtained by installing Google Analytics laid the foundation of Data Era at SKDRDP.

Data Metrics in Google Analytics

The journey of Google Analytics was not easy. I had to use a powerful thing like Google Analytics on totally unknown field of Company Website. Corporate sites are static compared to Blogs and Social Media. This static nature of corporate sites calls for keener look over optimizations as each and every move counts. A basic knowledge of Metrics plays a key role in static websites. Some of the important metric that tell detailed over view of site are.

  • Sessions: This metric tells how many times the site was loaded. In case of SKDRDP the Session metric tells us How many times in a month/quarter/year the website was browsed.
  • Visitors: This metric tells how many unique visitors loaded the site. If a person loads site twice then visitor count is 1 but session count is 2.
  • Page views: This metric tells how many times the pages of site were loaded. If a person browses through 10 pages of site, then page view metric will be 10.
  • Bounce Rate: This metric is expressed as ‘%’. Its actually the % of sessions in a particular day in which user browsed only 1 page i.e. % of sessions which had page view of 1. If in a given day there are 100 sessions, out of these 100, 50 sessions loaded only the index page and closed the session then bounce rate for it will 50/100 i.e. 50%.
  • Time on Site: This metric tells how much time on average users spent on site.  If one person Browses the site for 10 minutes and other person browses site for 5 minutes, then time on site will be 7.5 minutes(the average of 10 and 5 minutes sessions).
  • Pages per session: This is simple metric which can be obtained by dividing Page views by Sessions. The average metric tells us how easy it is to browse pages. Also this metric’s lowest value is 1, it can never go lower than that.
  • % of New Visits: This metric can be obtained by dividing visitors by sessions. Higher number means visitors to site are not returning and hence they are not loyal. Lower number means you have bunch of loyal followers but new people aren’t reaching you.
  • Referral Source: This tells who sent you the traffic to your site. The 4 metrics in this are Organic Search, Direct, Referral, & Social.
    Organic Search means people reached the site by searching on sites like Google.
    Social means people reached site by clicking on a link shared on social media platforms like Facebook.
    Direct means people opened the site by clicking on bookmark on their browser or typed the site address directly in browser address bar.
    Referral means people reached the site by clicking on sites link of site shared elsewhere on the web. Suppose if you click the link to my personal blog www.harshankola.in, then your click is considered as referral from skdrdpindia.org.

The above said metrics are very important metrics, and Google Analytics has host of other metrics. The concepts of conversions and goals are very powerful too. In case of SKDRDP, even all of the above metrics are not considered. Only a a subset of above metric is used for decision making.

Analytical Person + Google Analytics = Data Era

Analytics plays a vital role in decision making. The Audience, Acquisition and Behaviour metrics on Google Analytics provide you the answers to Who?,  Where? and How? on site interaction. These metrics are also major tabs on Google Analytics.

The main tabs on Google Analytics

  • The Audience tabs gives out detailed information regarding Who are the audience to site. It includes things like their demographics, interests,location,Technology aspects like browser and OS, Screen size, resolution.
  • The Acquisition tab gives out referral metrics, i.e.. Where they are coming from. This details on Channels of acquisition namely Organic Search, Referral,Direct, and Social. Breakdown of all the traffic sources like Google, Facebook, Direct, Pintrest, Referral Sites. Also Keywords, Campaigns, Adsense referrals too.
  • The Behaviour tab gives out data on how interactions are happening on site. This metric give important things like content loads, how many exits happened from particular page, site load time,site searches, site events and A/B Experiments, and Clicking heat maps.

The above 3 tabs are also most often used in decision making here.  One of the other metric used by me while making changes to site Real-time interactions happening on site. This real-time data tells how visitors are flowing through the contents. After index page where is their second click, which is 3rd page to load etc.

The Audience Metrics

The first thing to load when Google Analytics is opened is the Audience Overview Metrics. Also this metrics are often used for site’s major decisions too.

Basic audience metrics in Google Analytics

The metrics I personally use for making decision is the Sessions, Pageviews, Pages / Session and Avg. Session Duration. These are metrics which I push up all the time, because these are metrics which assures me that I am reaching to people well. All these metrics are period metrics and get affected by period in consideration. In the above graph the sessions are shown as total for entire august month as well as daily totals are plotted in graph.

By simply visualizing the metrics you come to know what is happening with site.

All time session count in Google Analytics

Example if you look at above given session chart for the date range when I took over the control of SKDRDP’s website. You come to know that session count increased through month of may and June and fell in month of july and had series of rough rides later on. The other metrics like Time on site and Pages per session are on downward slope. Based on these data you can make a judgement that things are bad contrary to popular opinion. All these things are tip of iceberg if you want to dig deeper.

The Traffic Plays

The 2nd in command to audience metrics is the traffic metric. In traffic metric you will be seeing which all sites playing role in sending visitors to you.

Site Traffic Metrics in Google Analytics

In the above image you can see how big organic search metric is. The reason that metric is very large because of role Google plays in sending the traffic to SKDRDP site. The referral and social are often incorrect because some of the facebook referrals are tagged as referral instead of social, hence this metric is often ignored by me. Google being the highest referrer SEO plays a key role design of the website.

Normally the comparison of referrer is done with pages/session and avg. session durations, which gives me a fair idea about which source sticks most. In case of bounce rates in above pic – you can deduce that bounces like that being common from referral and social channels, as the links of site rarely get shown on other sites and social media.

The flow of people on site

The third digging source is the Behaviour data, detailing how visitors are clicking on your site! The behaviour data is treasure trove of content popularity, experimentation, heat maps and other cool features. In case of SKDRDP site, the metric used most is content popularity. The most popular content of SKDRDP is actually the home page ;) .

Pageviews and Behavior metrics on Google Analytics

In above shown diagram the most loaded page is home page (index.html). The Page View and Avg time on page are the metrics used for designing the things. Heat maps and other things are used sparingly and only when necessary.

Its the world of Realtime

The real time metrics are used to understand how the users are browsing our site right now. The view is impersonal, but some important interaction data get shown to us like referral source, IP location, currently watched pages. If you see some abrupt behaviours like exiting from home page, you come to know something is amiss with site and go and check out what really is the problem.

Realtime user data in Google Analytics

In case above shown diagram you can come to know users are currently spending time home page as well as about us page. Also you can deduce they are being referred to our site by our subsidiary SIRI Gramodyog’s site. The blue bar in right most corner indicate that our site was opened in past 60 seconds. The page loads in past  30 mins are shown next to it. This also reminds of fairly active site loads in past 30 mins.


The journey of Google Analytics brought us to frontier of data based decision making. With the introduction of it, you come to know which of the features are getting traction, which things are getting clicked and which are not. Based on this you can make decisions about how things are going, whether the guy you have hired to smooth out things is actually making any change or not. Without data its all gut feeling, with data you can take decisions which has higher chances of impact. Putting this data through a Statistical Analysis you glean much more insight into the data. With the Era of Big Data taking hold its necessary know the control levers of the data industry. Analytics is first step in this journey.

You are reading an Article by Harsha Ankola, originally posted on Harsha’s Tech Space. If you have enjoyed this post, be sure to follow Harsha on Twitter, Facebook and Google+.

Resurrecting the Social for SKDRDP with Facebook Page Insights

In my previous article my one month stand with SKDRDP, I explained how I had go about bringing SKDRDP website on track, The ordeal I had to go through to introduce analytics.

This article will be dealing with the dawn of Social on SKDRDP. Technically speaking the story for this article started on 7 April 2014. The FB page I created for SKDRDP was on 7th April. The FB page laid the foundation for what I call “Social Era”.

The dawn of Social Era

The social era started with me opening the FB page for SKDRDP. I opened the page on 7th April. I decided to open it as the main aim of ED was to reach more people, and also the ability to share the photos of huge array of programs that used to happen with SKDRDP. On website, my skill set nor site’s disk quota was not sufficient to create a photo gallery, so I decided to go for FB as I could share unlimited photos and also provide link/badge to site for site visitors to reach it.

Some days after I was inducted, a circular was sent to all project offices to indicate my hiring as well as directions for sending the photos of programs. By April the sending had gathered steam and I was getting mails with photos of various programmes. To clear off the backlog of photos, I unilaterally decided to create page of SKDRDP, and start uploading photos to it.

First Steps on Facebook and Facebook Page Insights

Once I setup the page with filling all the requisite information, the main roadblock came upon me. The roadblock was of users, with lot of SKDRDP pages floating around on FB, it was difficult to establish a strong user base. Hence I send the page suggestions to many of my personal friends. Once my personal friends liked the page, I was able to be noticed by some outside people too. With people noticing the page, I had conquered the first roadblock.

The second roadblock was of Content. Since the page was set up with ultimate intent of sharing the photos of various programmes that used to happen, content was with me already in the form of photos. Later I created various albums for the photos and uploaded them. The photos were getting shared immediately, which resulted in the reach and engagement metrics to be low. But the album based division of photos helped in the things being more manageable initially. With the albums and huge backlog of photos I had at my disposal, I was able to tide over content wave but I was about meet a much more bigger wave called analytics.

The era of analytics started actually in the middle of content conquering mission. Since analytics requires minimum of 30 likes, I had to wait a little, but once I had 30 likes I claimed the special URL and activated the analytics. I was not much hooked to analytics in the beginning, but later on I started using it often. By end of April the page had around 55 likes, but the page was noticed by daughter of Dr. D Veerendra Heggade, and the day itself the page netted 50 likes to reach 100 page likes. With her liking the page, the page started getting noticed by  family members of our founder and also started getting lots of engagement. I searched for reason daughter of our founder was able to reach, then I came to know that, the ‘post scheduling’ which I had started some days ago was the thing.

My Social Analytics a.k.a Facebook Page Insights journey

I found the technique for post scheduling simply by luck. I just stumbled across feature in Facebook Page insights which would tell me when the page fans were mostly active on a particular day in past week. Based on that I would schedule the post.

The top Bar of Facebook Page Insights screen looks like this.

Top Bar of Facebook Page Insights

The basic insights I want from Page appears here. In overview all the basic stats are displayed. This screen shows the action in past 7 days. This basic stat is enough to indicate the overall health of page.

The screenshot of topmost row in Facebook Page Insights looks like this.

Basic Stats section in Facebook Page Insights

These are the basic most data in Facebook Page Insights. Left most is page likes, the 2nd number is how many people liked the page this week. I give more importance post reach in middle column. The post reach indicates how many people viewed posts from this page. The number 650 means in past 7 days 650 people saw the posts from this page. the number in red indicates percentage of decline from previous week. The engagement column shows the engagement metrics i.e. how many likes, comments, shares, or mere click of mouse on the post.  The people engaged tells how many unique people engaged (i.e. clicked/liked/commented/shared) with the posts.

All the above tells the health of page. Below it is section telling health of posts in Facebook Page Insights.

Recent Posts section in Facebook Page Insights

Normally I don’t give much attention to this section, I just glance over reach and engagement for posts. For the numbers which jump out, I just dig deeper why it happened. In above pic the post with 521 reach was because it was an advertisement of Agro Equipment Renting and was shared by 4 people. Even the photos of Dr. D Veerendra Heggade, our founder gets lots of reach and engagement.

Next section is of comparison with other pages whom we consider as our competitor on Facebook Page Insights.

pages to watch section in Facebook Page Insights

This section compares the pages I have chosen to compare with. SIRI is subsidiary of SKDRDP and sells the products made by rural women. I often compare this page with Greenway Grameen Infra (reason: I simply love this company). The posts this week is actually very less. normally it races to 30 posts in busy weeks and averages around 18 per week. Even the engagement metric is understated in pic. Normally this metric stays around 100 for SKDRDP. For Greenway Grameen Infra, if Neha Juneja’s interview are shared then their engagement runs to 100 with ease. For smaller mentions of company or its founders the engagement rates are low.

The above said things are constituents of SKDRDP’s overview page. The next most visited tab in Facebook Page Insights is posts tab to determine ideal time to post.

scheduling tab in Facebook Page Insights

In the above shown pic the ideal time to post is at 8pm with highest point on the peak. This graph plots the average page fans who were active during given time slot. The number 56 indicates that on average 56 page fans were active on Facebook during 11am (this average is of past 7 days average). The weekdays number shows how many fans were active past week.


This was my journey with SKDRDP and its Facebook page. The page and I have come long way. Its this social era which began with dawn of FB page which transformed things around. Looking at the success of the page, the temple people have started a grand project to bring all the Shri Dharmasthala Manjunatha Temple properties under one control. All the education institutions of temple and all other initiatives are being brought under one roof by setting up a grand page linking to other sub pages.

The centralization at temple has helped me to continue my journey towards entrepreneurship. I think this post may be the last post as employee of SKDRDP. Its always necessary to keep on learning, widening our world views, because if we return to same point where we began, we will atleast be much wiser.

You are reading an Article by Harsha Ankola, originally posted on Harsha’s Tech Space. If you have enjoyed this post, be sure to follow Harsha on Twitter, Facebook and Google+.

My 1month stand with SKDRDP

I joined Sri Kshetra Dharmasthala Rural Development Project(SKDRDP) on 10th March as Manager for its Website Management department.  I was selected for this Job after a long interview with Executive Director (in short ED) in the Month of February.  The ED’s primary gripe was despite their super active works none of their activities were getting onto their website, in short no one knew what all things they were doing.

This problem of their made the office of ED was kind of Bipolar. the ED was sitting with all his problems of site not getting updated. At the other end was me – The Juggernaut with will to blow their problem into smithereens. During the conversations the talk some how divulged into my blog. Then they asked me to show my blog and compared it with theirs. My blog was more techy but also more refined than their site.This charged contrasting atmosphere between my blog and their site made poles to come closer and I landed up with SKDRDP.

Joining Events

After I joined, I was given a PC for my works, and I started with the copy of website. I studied the copy understanding its control points, data sources. After some time I learned all the aspects of the website. There were many problem points in website which were affecting users who earnestly wanted to know what is SKDRDP. Some of the painful problems were related to content length, content visibility, redundancy, and most importantly user engagement.

In my initial meeting with ED after I Joined, he gave lots of input on what all things which can be done and needs to be done. When he used to say the suggestions, I used to wonder why such an easy problem was not getting implemented at all. He also constantly used to say we have lots of users, but neither of us had any estimate of it. The site initially didn’t had even analytics installed. In the beginning the site was kind of blind assault, no one knew a bit about users, there was absolutely no data regarding it. Site was for me like house whose rooms and residents I didn’t knew. This pissed me off and I decided to bring in my analytics knowledge to track the data despite it being out of appraisal purview.

The Dawn of Data Era

Once I joined, I pondered about the website trying to understand it. Tried discovering the levers which control different aspects. Once I was comfortable with it, I started with slow changes. One of the pages was down right stale and was not contributing any to the site. That was the first page to get axed by me. Once I axed it, I moved ahead into build the era of data into the site. I signed up for account on Google Analytics and installed it on all pages of the site. It took me around 3 days to install analytics on all pages. By march 21st the entire site was tracked.

Once the site got tracked and I started getting data regarding the visitors I got webmasters too linked to it to see keyword data. Based on keyword data I was able to conclude that site was indeed famous with established user base and keywords. Since SKDRDP is affiliated to Temple, the users were coming to it easy, hence by just updating news section 250 weekly views could be garnered in average for that section.

When I used to compare the data of the site with this blog, the blog was far smaller compared to its traffic. For the site the only thing to get most attention was content section of Google analytics and how well the site was doing with the content. The user demographics and user acquisition was immaterial as the site was established site with stable traffic. On the other hand the this blog was down right puny with no established traffic and Stable user base or keywords. This comparison used to give me an impression that “I am nothing”. I used to wonder what made them hire me? when compared to their tech section and their skill set, I was small and unskilled (I neither know HTML, Java Script or CSS fully). The only thing I know was, user engagement. The tech section was filled with engineers hence they focused on technical aspects, whereas I was product of experience hence focused more on user aspects of site. This user focus made me conversant with ED who used to see site from user POV. This user level focus brought the power of Facebook in getting nearer to general public willing to see our work.

Getting Social

Somewhere in the month of April, I jumped into Facebook and opened SKDRDP’s page. With the entry of Facebook, things started changing. The perennial problem of user being unaware was kicked in its butt. Initially the photos in FB were shared immediately which came at cost of reach. Later the post scheduling was employed which profoundly increased reach and enhanced engagement. Also site went through many cosmetic changes to accommodate the focus it garnered by Facebook.

By getting social and having strong data underpinning, the site is now independent, and on strong footing. Technically from now on the site doesn’t need my care any further, it can grow up on its own further.

You are reading an Article by Harsha Ankola, originally posted on Harsha’s Tech Space. If you have enjoyed this post, be sure to follow Harsha on Twitter, Facebook and Google+.

The Great Hadoop Operating System for Big Data

Some days ago I was going through an article on Big Data. I couldn’t make a head or tail of it. I immediately asked IBM & My Colleague working at IBM to help me to get going on big data. Both of them forwarded me to Big Data University, and to free e-books by IBMers.

I was going through the book, and actively trying to link the different pieces like HDFS, Map Reduce, Hadoop, Pig, Hive, Jaql, Zoo Keeper, Flume etc.. Then I realized:

Hadoop and and the different components make a specialized computing system for big data.

The different components of Hadoop and the components of OS is almost similar.

What’s Computing System??

Computing System (CS) (this term is coined by me so don’t google it) is comprised of many components. The different components are

  1. Storage system to store the data submitted via Input (E.g. Hard Disks)
  2. Input devices which produce data streams (E.g. Keyboard, Sensors)
  3. Output devices (E.g. Screen)
  4. Operating System for managing the show for users and hardware (E.g. Windows, Mac)
  5. Machine Language aka Machine Instruction Set (E.g. Intel SSE, Intel MMX, Intel VT-X)
  6. High Level Languages for writing apps and scripts (E.g. C, C++, Java, Python)
  7. Application and System softwares to do the user defined tasks, as well as managing the high level system activities. (E.g. MS Word, Photoshop, C Cleaner, Antivirus, Disk Defrag)

The Storage System is one of the important abstraction in computing System. For the Users the storage system gives illusion of Folder File Tree structure but the files are actually stored  as blocks of fixed sizes on hard disk platter. Primarily the main function of storage system is to give users Easy to Manage abstraction of storage function and handle the difficult process of physical storage all by itself.

The Input devices produces stream of data. We split streaming Input into 3 things. The first is source which produces data, the filters is 2nd thing, which process the source stream. The final component is sink, which is the destination for stream. Incase of Computers source will be keyboard key press data. The filter will be the Controller which performs the complex filtering operations like input validation etc. The file in which data is stored is sink.

The operating system is the interface between computer and users. The operating system performs 2 functions of managing the hardware resources and providing interface to users to do their tasks. UI is like layer for kernel and doesn’t include much of complexity like resources. The system management is tough nut to crack, hence kernel has lots of things like process mgmt., memory mgmt. modules, and loads of technologies and algorithms working behind scenes to make system useable to users.

Every processor comes with with its own of instruction it can understand. These instructions form the part of assembly language, and instruction set is called as machine instruction set.

The high level languages are created to make programmers job easy. The high level language programs when compiled/interpreted they create sequence on machine instructions. HLL’s also provide higher level of abstraction so that programmers can choose to focus on complex problems instead of optimizing code for machine.

The application and system softwares are created with  high level language and solve specific problem of users. Like Adobe PageMaker solved publishing problem. MS Excel solved spread sheet computation problems.

Hadoop Ecosystem and Compute System

The CS and Different Hadoop Ecosystem components have lots of Similarity between them.

  1. Storage in CS is similar to Hadoop File System (HDFS). The HDFS is Distributed Storage System and the way data is actually stored in HDFS/CS and How we view data is totally different.
  2. Apache Flume is Input equivalent of CS Input Device. Flume routes data into HDFS. Flume can be viewed as log data continuously being stored in a file without any user intervention.
  3. Hadoop is like Operating System which manages the show for User as well as manages the Resources. The way OS has many components like Resource Managers, Kernel, File systems – Hadoop has different components like Hadoop Core, HDFS, Hadoop YARN, Hadoop Map Reduce
  4. The Map Reduce Framework is like Machine Instruction Set.
  5. The Pig, Hive and Jaql are High Level Languages the way we have C, Java, Python in CS. The commands in above languages are converted into corresponding Map Reduce Jobs.
  6. The Mahout, HBase, Cassandra, Ambari, Zoo Keeper are the various Application and System Softwares equivalents running atop Hadoop.

You are reading an Article by Harsha Ankola, originally posted on Harsha’s Tech Space. If you have enjoyed this post, be sure to follow Harsha on Twitter, Facebook and Google+.

© 2017 Harry's Tech Space

Theme by Anders NorenUp ↑