pipeline, analytic & visualization

all about data

Recently I have been thinking about what is data pipeline, data analytic and data visualization. Here are some of the definitions I came up with.

Pipeline implies permanency (of course you can build new pipeline and destroy old one, but otherwise it should there running or ready to be ran with zero coding involved). Often it involves i/o, database selection, DevOps, and performance optimization, and generally the scope is per organization (data is like water in a sense that its is almost impossible to have fully independent pipeline system).

Analytic implies data centric, insight/goal driven, almost equivalent to data mining, usually comes with a sentiment of "I want the data to do this _". Analytic can be fully automated, aka "analytic pipeline", fully human (data analyst/scientist) curated, or anywhere in between.

Visualization implies human centric, and the goal is to efficiently communicate data to a human, with a sentiment of "I want see what's going on", even if human is unable to interpret the results correctly (how web-mercator is distorted/misleading, a separate topic of course). Again, similar to analytic, it could have various level of human work involved.

So in short, if you are accessing data with the same pattern, that is pipeline; if you are accessing data with a specific goal in mind, that is analytics; and if you goal happens to be to see the data, that's visualization. So of course, there are overlaps within these three areas, but I think its good to find the best (or a combination of) definition instead of using these interchangeably.