Almost an year ago, I switched from freelancing to working full-time for a Big Data and AI platform at Kogentix. It was a great opportunity to work in a new domain that I had not been able to touch so far and the best bit being it allowed me to continue working from home. Kudos to the management for believing that remote works just fine.
Moving on to my experience. Well to be honest first three months were hell. I literally felt like a fool during the stand-up calls and brainstorming sessions. I couldn’t understand a single term. I was down to recording all the calls and then googling the terms later to understand what the hell was everyone talking about. There were times I questioned myself, that did I make a big mistake by signing up for this? Am I too old for learning new things? What will I produce, if I don’t even know what is happening here?
But well thanks to the patient co-workers, who answered my every single dumb query, I started to understand the big-data terminology. Terms like aggregation, ingestion, clustering, profiling, ETL, Modeling, Predictive analysis started making sense. And so did the calls and conversations. I don’t claim I know it all, let’s just say my endless stupid queries have come down to an extent. 😛
The more I understood the product, the more I realised that the people working on it were all from the domain, the data scientists, the developers, the architects. They knew the domain, hence it was difficult for them to grasp what a common user like me, who knew nothing about data science was feeling. And what I was feeling was overwhelmed. Every time I would open the application, I would give up after a few screens, because of the cognitive overload. Within a few screens I had so much to learn that I would say, that’s it for today. I will handle the rest later. If I had to design for this application I had to understand it first, and there was no rushing into it. Other important thing being that I had to understand it from a novice user’s perspective, not a power user’s. The power user would know his way around. The job was to accommodate the common user.
From what I figured we had two types of users:
- The mighty Big Data Scientists, who live and breathe data.
- And the Managers/Analysts who not necessarily understand data science, but depend on numbers and trends and graphs to make a lot of major decisions.
While, the former would love the terms and explore the data at their own peaceful time, the latter needed quick insights, what happened in the background didn’t matter. They wanted the trends, numbers and dashboards to make and propose quick decisions.
The application, when I joined was doing well with the data scientists, but it needed a big boost to cater to the non-data science people.
Over the year, we did a lot of changes towards making the application a little more user friendly, consistent and easy on the eyes. Since, it has already been a long post, I will cover one case scenario and then share a few more in upcoming posts.
The case we are taking up here is that of a single screen from a module in the application : Discovery.
Discovery module in the application is a module where user brings in his data and the system gives him some basic trends and analysis on the data and let’s him visually explore the data, create new analyses from the data columns he finds interesting, play with data, filter the data he’s interested in and generate various profiles (new datasets) that make sense to him.
When we began the exercise, the module was at a very nascent stage, offering a very simple column level analysis and count of datatypes of columns as trends. The user did not have the luxury to play around with the data, drag and drop things.
The process of overhauling the discovery module started with:
- Noting down what the users( both the data scientist as well as the analysts) would want with their data.
- How to bridge the gap between the two users.
- How to make the screens work for both the users requirements.
With these in place, we sketched out the basic screens and flows on paper:
Once the basic flow was sketched out – we started digitising the screens for a quick prototype, that could be shown to the user base for feedback.
We started with a very basic version of the landing page and went through lots of changes to match with the requirements, technical feasibility and scalability.
Observations with this version : We started assuming that the data will always keep updating (Real-time data) or data is over a period of time. And hence, a major chunk of header was given to the time stamps and the trend of data ingestion and filtering it by time. This was later removed since it was very difficult to capture time stamp. Not all the data had time stamp associated with them. There was historic data that was never going to change. So most of the time our header would have displayed data that gave no insights to the user.
Also, we were deciding the sample of data for the user, instead of him having a good control over what data he wants to discover. He had only two options, either a random sample or full data (which would take a long time to analyse).
From, the above observations came the next iteration:
A lot changed in this iteration. Right from navigation to the addition of left pane for user to see and add different analyses. Improved user friendliness of the action bar above the raw data. We even explored variations for the mini trends for columns. Added more views for the raw data – Grid, list and statistical. Our datatypes were now in two formats : Physical and Analytical. A common user understands physical but for a data scientist analytical makes more sense.
Summary tab was renamed as profiling – where the user could create various profiles on the existing dataset to meet his requirements. Supposedly, he is only interested in sales data of a state, he just puts a filter and saves the resultant data as a profile for future use.
Also, we came up with more options for selecting the data sample on top. By default user got a random stratified sample, and had options to later change it to filter based or stratified.
Though, things seemed better, we did another iteration to accommodate small changes.
Notice the trend line from the top header was taken off, as it still didn’t make sense for most of the data samples we tried, and turned out to be a wastage of real-estate.
Also, notice, the small icons on the left pane – We started showing icons – system icon for system generated analyses and user icon for user generated analyses. Also, time stamp for when the analysis was created.
Apart from this we did a lot of changes to labels and text to make it friendly for a common user but for places where it made sense we added a toggle to switch between the two user types preference. An example of this was the datatype format that I have already discussed in the post. Second (screens not shared here), was when applying functions to the raw data, we have a toggle to switch to user friendly terms.
For ex: addstr() function. While it makes sense to data scientist, for our non-data scientist users we had the functions tagged in user friendly terms like add text to the end or beginning. And the user can switch between the two terminologies with a simple toggle.
Phew! this is was long post. Will try and share more samples in later posts. Discovery module re-design had some 80+ screens so am sure there’s more to learn from all the iterations of the various screens.
Quickly summarising my takeaway from this exercise:
- Data science can be overwhelming for non-data-science users. There’s a need to bridge the gap between technology and it’s users in this domain.
- Just static charts and graphs don’t cut it any more. User wants more interaction to play, drill down and generate insights. And any tool that helps with that has to cater to all types of users, not just data scientists.
- It is not about beautiful visualisations, it’s about data and insights that makes sense.
- When we are creating a domain specific application, it is easier to give insights, but when you are trying to create a generic platform, which can be fed any type or any size of data, there’s a lot of technical feasibility that comes into picture and how one handles that and still manages to keep the user’s experience on a positive side is the ultimate trick here.
With that signing out for today. See you soon with more insights as I explore this domain further. But, so far am loving it.