I am part of a very active whatsapp group chat with my college friends, and I got interested in generating some statistics about the chat. So here we go.
Initially i exported the whatsapp chat history(I only had chat history starting from 2015 Februray 9, as I switched to a new phone on that day) into my MacBook by using the "email conversation" from the group chat "More" tab. So I got it in my mailbox, and downloaded the attached txt file into my laptop. I opted to avoid the media files, and was interested only in the text messages.
This is what I have now :
[sreedish.ps@~/Downloads$]cat chat.txt | head
9 Feb 10:28 pm - +91 99160 54737 created group “Ooty Pattanam”
9 Feb 10:28 pm - You were added
11 Feb 7:32 pm - Sreedish: I lost all my what's app history
11 Feb 7:32 pm - Sreedish: Changed my phone
11 Feb 7:51 pm - Nithin Mbt: No backup of mobile possible ?
11 Feb 7:54 pm - Sreedish: Gallery and contacts restored
11 Feb 7:54 pm - Sreedish: But not chat history
11 Feb 8:04 pm - Nithin Mbt: Umm
11 Feb 8:09 pm - Anoop Mbt: Which phone?
11 Feb 9:35 pm - Sreejith Mohan:
Two unix commands I used cat and head. Cat will print the contents of the file into stdout, I piped it into the head command, which will print only the top ten lines.
My first attempt was to find out who is the most active member in the group chat, and for that I needed to count the number of messages typed by each member, sort it, and get the guy with the most number of messages. I observed a nice format in the messages, the messages were of the format
"date month time - sender:message"
So inorder to get the sender name, i should strip out whatever is in between "-"(hyphen) and ":" (colon).
[sreedish.ps@~/Downloads$]cat chat.txt | awk -F '-' '{print $2}' | head
+91 99160 54737 created group “Ooty Pattanam”
You were added
Sreedish: I lost all my what's app history
Sreedish: Changed my phone
Nithin Mbt: No backup of mobile possible ?
Sreedish: Gallery and contacts restored
Sreedish: But not chat history
Nithin Mbt: Umm
Anoop Mbt: Which phone?
I used the powerful and my favourite awk to do this. the command was
cat chat.txt | awk -F '-' '{print $2}' | head
which means, cat it to stdout, pipe it to awk. Awk splits a sentence into words, and the default delimiter is space. But by using " -F '-' " , I am telling the Awk compiler to use hyphen as the delimiter instead of space. '{print $2}' means, after splitting using hyphen as a delimiter, print the second field.
Eg: assume this is the line "11 Feb 7:32 pm - Sreedish: Changed my phone". So after splitting as hyphen as delimiter
$1 = 11 Feb 7:32 pm
$2 = Sreedish: Changed my phone
And i wanted $2, because it contains the sender name. I used a head because, i didn't want to flood my terminal.
Now I based on colon, I will strip out only the name of the sender.
[sreedish.ps@~/Downloads$]cat chat.txt | awk -F '-' '{print $2}' | awk -F ':' '{print $1}' | head
+91 99160 54737 created group “Ooty Pattanam”
You were added
Sreedish
Sreedish
Nithin Mbt
Sreedish
Sreedish
Nithin Mbt
Anoop Mbt
Sreejith Mohan
The command is
cat chat.txt | awk -F '-' '{print $2}' | awk -F ':' '{print $1}' | head
I piped the output of first AWK to the second AWK which uses ':' as the delimiter, and this time i wanted $1 as the name of the sender was preceding the delimiter. Now I stripped out only the sender names, an all I have to do is a sort of them and make a count.
[sreedish.ps@~/Downloads$]cat chat.txt | awk -F '-' '{print $2}' | awk -F ':' '{print $1}' | sort | uniq -c | sort -r | head -14
3093 Sreedish
2285 Aravind S Chennai
2104 Kk Bangalore
1527 Sreejith Mohan
959 Keeru Unname
713 KK US
688 Rahul Raghavan
629 Nithin Mbt
428 Rajesh Babu Nit
182 Anoop Mbt
70 Shekar
43 Jyothi
37 Suman
34 George
Command used is
