0. Blackpink In Your Area
I love Blackpink. I'm not a fan of k-pop, but there's something about this YG Entertainment group that had me hooked. Maybe there's something about their music, their visuals, or their concept, or my major crush on Jisoo 💛; but I love this group and I had a huge Blackpink phase in 2021.
Like, watching their documentary or their music videos more than once kind of phase. Maybe more times than it's cool to admit.
Aaaaanyway... besides my bias, I also find k-pop stans quite intriguing. After fan armies started turning political, I couldn't help but feel curious about the community. As an outsider, it's hard to understand the whole context. But as a Blackpink fan myself, maybe I could unravel a bit of the content surrounding the Blink (Blackpink stans) fandom.
So here comes the question: what's the content structure of the Blink fandom on Twitter?
This dataset is comprised of 983.292 tweets from November 7 to November 9, 2021.
I used 6 seed hashtags: 2 related to the group (Blackpink, Blink) and the 4 members' names (Jisoo, Jennie, Rosé, Lisa).
I processed the data in Gephi to get a co-hashtag graph of the 500 top hashtags. To add context, a co-hashtag graph shows a network of the relationships between hashtags. This means, that if one tweet contains two hashtags, then they'll appear as linked in the graph.
The graph is undirected with weighted links: the more relationships between two hashtags, the stronger the link is.
2. Data Cleanup
Given my first results were... confusing, I started cleaning up the set. To do so, I applied the ForceAtlas 2 layout in Gephi, isolated the network's giant component, and color-coded each member's ego network.
The results weren't enlightening (or at least I couldn't get anything out of them), so I had to rethink the plan.
After this initial exploration, I considered my limitations and biases in order to formulate a proper objective and plan for this endeavor. After all, fumbling with Gephi and spreadsheets aimlessly wouldn't get me anywhere insightful. Thus, I figured the following:
1. Twitter hashtags are not an overall indicator of popularity.
2. Seed hashtags are too generic.
3. Seed hashtags are too linked to each other.
4. The Blink community behaves very differently on every platform.
5. I only have 2 days of data.
1. I am a Blackpink fan.
2. I have favorites: Jisoo & Lisa.
3. Curse of knowledge: topic is not well known (or known at all) in my audience. (This is a data designer's portfolio, I can't assume my readers' music taste. Nor fangirl without giving some context.)
4. Availability heuristic: 2 days of Twitter data.
5. Stereotyping: I’m an outsider to k-pop and Korean culture.
To explain the limitations a bit: not all Twitter users use hashtags actively on the platform, and I'm not active in the Blink community to be familiarized with their behavior before I started this experiment. I am a fan, yes, but more the "I listen at home and consume their content alone" type, so I couldn't extract relevant conclusions from the community even if I wanted to.
And even if I was a proper insider, two days of data is not enough to understand such a complex community of fans from an insanely popular group.
Also, seed hashtags are too generic. For instance, there are many other famous Lisas. So the data needs a more focused cleanup. And color coding ego networks was not enough because given that the girls' individual careers and topics are quite intertwined, the results change depending on the selection order. So I also needed a better way to isolate these networks, but this was doable and interesting enough.
Knowing this, I set myself a proper research goal to work with the data I had:
Analyze the Blackpink Twitter hashtag network and each of the members’ ego networks spanning 2 days of content.
After all, besides being members of Blackpink, each of the girls has consolidated careers as singers, models, spokespeople, and actresses. Thus, I wanted some perspective on their individual reach and related topics over these two days.
4. Better Cleanup
To get an ego network, I selected all the nodes related to a specific one. So, instead of coloring everything together, I isolated 4 networks related to the members' names: Jennie, Jisoo, Lisa, and Rosé. I also kept the original network as is, to compare it with the ego networks later.
Then I removed the main actors from all the graphs. This meant: everything related to the seed hashtags and variations. Why? To study the relationships of the surrounding concepts without the glue that binds them together.
So I removed the following nodes:
Blackpink, Blink, 블랙핑크, 블링크
Jennie, Kim Jennie, 제니, 김제니
Jisoo, Kim Jisoo, 지수, 김지수
Lisa, Lalisa* Manoban, 리사, ลิซ่า
Rosé, Rose, Roseanne, Park Rosé, Chaeyoung, 로제, 박채영
*Lalisa on its own wasn’t removed because it’s also a song/album name.
After that, I went through the communities individually and started removing irrelevant results. For instance, there were tweets that only used popular hashtags for unrelated reasons, like quoting other random hashtags or promoting porn (classic internet 😩).
Also, given that Lisa's and Rosé's names are common, I removed all the communities related to other artists or topics.
After a bit of manual cleaning, I finally got something to work with.
5. Results and Analysis
Finally, I got 5 networks with a clearer structure and related topics. So I went through all the communities and colored them according to their main topics.
For that, I noticed the "main node" or the recurring node in each of the networks and colored the community accordingly. Then I named the communities properly, according to the topics they contained.
And finally, I got the 5 Blackpink Networks: an analysis of 2 days of Twitter content.
The first thing to analyze was the communities' presence within the networks. I classified them on a Venn diagram, then I counted them.
I noticed there were two important distinctions:
1. Show the communities that also appeared in the original Blackpink network.
2. Some of these communities are related to events spanning a specific timeframe.
I marked these cases within the diagram and made a timeline of the events to be able to draw conclusions.
After this analysis, I noticed these regarding each member:
Lisa has the most diverse events and content related to her. And also, most of the latest releases (Lalisa & SG), so it might explain the diversity of content.
Jisoo is the one with the least networks/content diversity. Also, she shares the least with other members, and her individual topics are not featured on the Blackpink network.
Rosé’s content is not as recent as the other members. After all, her last single was released in March 2021. Other members have more recent releases.
Jennie’s brands and sponsorships are the most prominent within the group. This implies that, within this time period, Jennie was the most effective brand spokesperson.
Bottom line, this was a fun exercise. I'd love to do it again, without the Lalisa hashtag, to see what might change by having no exceptions.
It would also be insightful to gather data from other social networks (e.g. Instagram, Youtube, idol sites, and forums), and over different periods of time. What would change in their next single or album release? Or with a major sponsorship deal?
Or with a different artist/band. Are their concepts as intertwined as in these networks? How different are the individual-collective dynamics surrounding the conversation of a band?
As in any research project, this left me with a whole lot of questions I might explore eventually. But network analysis was an interesting enough discipline to explore my 2021 not-so-guilty pleasure.
Project advised by Pablo Aragón Asenjo.