0. Crazy, but the fun type
I began using dating apps for the first time ever in January 2022. Back then, swiping and browsing profiles was super fun and exciting. After several days, I started noticing patterns and asking questions.
Why are there so many Tauruses and Cancers?
Why are so many of these people into bouldering?
Why do I rarely find women on these apps?

11 months and many conversations, disappointments, and dates later, I don't find the apps so exciting anymore. But the questions never went away: I'm still pondering about the patterns that prevail in my dating pool. So, after months of thinking about this, I'm going to show you a rough picture of the catalog of my dating prospects.
1. Bisexual woman, Aquarius sun
For this experiment, I decided to use only Bumble as this app encourages users to share more information about themselves. For the ones who've never used the app, after a match* is made, women have to make the first move. So the profiles are built as a way to help us start conversations and get to know people more than just... hooking up. (But you can also find hookups, obviously).
As an awkward person, I love having more info about my match to say something other than "hi". And as a researcher, I love it even more. Bumble profiles offer the chance to share a lot of neatly categorized data, so I'm gathering their categories as is.

But before talking about the categories, we need to start with my profile. Why? Because I hypothesize that:
1. The algorithm shows me potential suitors that match or are catered to be compatible with my personal information.
2. Potential suitors have been refined according to my choices since I first downloaded the app.

So, as embarrassing as this is for me, here are some screenshots of my profile's most relevant sections and the information I chose to display. This profile stayed the same during the whole experiment.
Anyway, please don't judge my profile. Just... take my info as a starting point to understand the data set I'm gonna describe further on. 

*Match = when 2 people in a dating app have mutually "swiped right" on each other. "Swiping right" is the name of the gesture for liking a profile. Once there's a match, these people can talk to each other.
This project was NOT sponsored by Bumble. I just do these things in my free time.

Fun fact: this pic was taken on About Time's opening night. So it's the perfect combination of a professional and a swimsuit pic.

2. A whole picture of you
There are more than 20 data points you can submit to the app, and most of them are option based. So, given the exploratory nature of this experiment, I decided to gather most of them to be able to analyze all patterns possible, not only the ones I casually noticed.
Also, as a user myself, I chose to not answer some questions and I noticed most people don't answer them all either. So the absence of a variable is another factor I noted.

How did I gather this data? By swiping profiles in the app manually and inputting their data into a spreadsheet. It was a long and tedious process, but I collected the following variables:
Let's also remember that, in addition to my researcher role, I'm also an app user and I want to find people to join me for city trips and beers. So clearly, I have preferences of my own. And, if my hypotheses are correct, my choices will affect the types of profiles I'll see as the experiment goes on.
So, for control purposes, I also gathered:
1. The date when I saw the profile
2. If I was interested or not in the person (swipe left or right)

I'm not gonna delve into the logic of my swipes/preferences though. After all, these are pretty subjective and, quite often, inconsistent. It's not like the app will know if I swiped right because I found someone attractive or because I liked their hobbies, and you don't have to know either 🤫.
But this experiment started out of personal curiosity, so I strived to get hard data to answer the following questions:
1. Which of my variables appear the most in my dating pool?
2. What's the ratio between women, men, and non-binary people in my dating pool?
3. What's the most frequent and average age and height of the people in my dating pool?
4. From how many different countries are the people in my dating pool?
5. What's the predominant zodiac sign in my dating pool? (I'm an astrology fan after all).
6. Are there any interesting connections between data points in my dating pool?
7. If I could make an average person out of all these people, what traits would they have?

So, for 13 days (August 2nd to 15th), I started swiping and collecting data*.

*For privacy purposes, this case study only shows general data and overviews. I gathered no personal or sensitive information from any of the 150 people I found: just information that doesn't make any of them easily identifiable. I did not keep screenshots of any of these profiles.
3. 私は間違った (I made a mistake)
So after painstakingly reviewing and inputting 150 profiles, I realized I made a crucial mistake: I forgot to check the "Date Filters" option. Meaning, this little icon next to the Bumble logo that I haven't touched since January.
To add some context, Date Filters allow you to fine-tune your dating pool. You can delimit variables to modify your dating pool. The free filters are "gender", "age", "distance", and "languages they know". But paid users can modify all variables and, for instance, display only people that are taller than a certain height or only from a specific religion.

Anyway, back in January I also had the secondary goal of finding a language exchange partner. And, given that the app also shows you users who don't speak all of the chosen languages, I completely forgot about this. I just got carried away.
But well, sometimes mistakes happen when you're too excited about collecting data for fun. And mistakes are valuable parts of any story, sugarcoating them is a tad boring, innit? 😅 

Also, I never shortened the distance, which hindered my personal goal of finding someone to drink a beer with. The age filter has always been intentional though.

So, instead of redoing everything, I'd rather show these filters as they are: constraints for the current dataset. In other words, the app prioritized people who:
1. Are between 25 and 36 years old.
2. Speak English and/or Japanese.
3. Live basically anywhere.
4. Proportions
First, I wanted to show the proportional distribution of all the variables. For this, I showed every single person as a data unit.
1. More than stats, they're individuals.
2. It's kind of fun to imagine them as a crowd: out of 150 people, who can be classified as what?
So here are the charts* for the personal info variables:

*This website is built in Adobe Portfolio. So, as much as I want to, I can't show interactive elements or visualizations here. So, while I finish coding the new version of this website, enjoy these slideshows (:
The lifestyle variables:
The relationship and beliefs variables:
And finally, my swipes and matches.
Out of 22 people I liked, I matched with 5 of them.
2 of them added me on Instagram.
I talked to one of them a couple of times and then... nothing.
So, out of 150 people, I didn't get a single date 😅

So no, I didn't get someone to come with me for city trips and beer this time. But at least I have interesting data.
Let's keep analyzing.
5. Distributions and similarities
Let's get back to one of my initial questions:
What's the ratio between women, men, and non-binary people in my dating pool?

I showed the chart in the previous section but I'll write the numbers here.
Men: 91.33% (137)
Women: 6.67% (10)
Non-binary: 2% (3)

As I suspected, my dating pool is overwhelmingly male. So, reasonably, most of my swipes (and all of my matches) are men. I read a statistic that, in the US, "62% of users are men and many women are overwhelmed by the number of options they have." (Matos, 2022)*. 62% is a big number. But 91.33%... oof. No wonder why I thought there were no women in here.
Fun fact: the 3 non-binary people I found during this experiment, are the first non-binary people I've seen on dating apps and the only ones I've ever found on Bumble.

Then I decided to answer my first question:
Which of my variables appear the most in my dating pool?
But after seeing my low number of swipes and matches, I wanted to add a new dimension to it: how do these variables relate to mine?
In other words, are these people's data not compatible with mine? Or am I just really picky?
First, I drew a bar chart showing the percentage of people who chose to fill in a specific variable.
As it shows, most of the variables were filled in by the users. And interests, languages, and drinks are the top 3.​​​​​​​

Variables in bold text are the ones I filled in my profile.

I think interests are relevant because it's the most obvious sign of compatibility (and the easiest way to approach someone: ask about their interests).
Languages are also essential for communication.
So, for the rest of the categorical variables, I designed a series of ranking charts to show the distribution of these values in an easier way, and to see where my own answer is among the data.
So here are the charts for the personal info variables:
The lifestyle variables:
And the relationship and beliefs variables:
By putting the data this way, I noticed that my value is not part of the majority in most variables. It's only in two of them: education level and religion.
To be honest, I didn't expect it to be in the majority as frequently.
What perplexes me are the variables in which it's the minority: looking for and kids. Because both variables, in my opinion, represent fundamental incompatibilities. Let me elaborate:
Even if both of them are categorical values, they can be classified by levels on a scale. And the levels of acceptance of other values vary depending on where you are on a scale.
For instance, I'm on one of the "extreme" ends of the scale: I don't want kids. Thus, it wouldn't make sense if the platform showed me people with values on the opposite ends of the scale (have and want more kids) because we'd be fundamentally incompatible.
After all, if someone doesn't want kids the last thing they'd do is try to match with someone who already has them.

But the platform still shows me people with a view I consider undesirable: people who want kids someday. Why? I can speculate two things:
1. The pool of users with an "acceptable" and "ideal" view is not enough.
2. For the algorithm, these tolerance values are not as rigid.
After all, if someone wants kids in the future the date might still work... as long as we're not together in said future. Or views may change, I don't know. I didn't make the algorithm, I'm just speculating about it.
In this variable, the majority of the users fall in the undesirable value (want someday, 56 users: 37.33%).
Tolerance levels for the kids variable. The "ideal" value is my own (don't want kids) and acceptance values worsen the furthest a category is from said value. Acceptance categories are subjective.
The same logic applies to what we're looking for. I got no users from the opposite end of the scale (marriage), but I did get many users pursuing a relationship (28 users:18.67%).
​​​​​​​Unlike the previous variable, the majority of the users fall into the acceptable value (don't know yet, 72 users: 48%).

Note: my definition of "something casual" is an interpersonal relationship with no expected romantic arrangement. I consider one-night stands, friends with benefits, or occasional meetings as "something casual". Casual relationships still have defined rules and boundaries.
A "relationship" entails formal romantic commitment and a series of agreements and expectations. This may or not be monogamous. ​​​​​
Given that these terms have no standard definition, I decided to explain my understanding of them. This is the basis for my conclusions regarding the "Looking for" variable.

Tolerance levels for the looking for variable. The "ideal" value is my own (something casual) and acceptance values worsen the furthest a category is from said value. Acceptance categories are subjective.

I decided to combine both variables because I noticed their position on the scale depends on the same parameter: the level of commitment a type of relationship entails.
Dating someone with kids involves a lot of commitment, especially if the person gets involved with said kids. And marriage is a legally binding contract, getting out is not as simple as breaking up and vowing to never speak again.*
Thus I combined them to see if the level of commitment matches in both variables: do people who want a more committed relationship also want kids?

*I'm not saying breakups are easy. They're not. But at least there shouldn't be any legal fees and processes involved.
When putting both variables together, I immediately noticed that more people are willing to commit to a future kid of their own (56) than to a formal relationship with another person (28).
Also, most of the people who want kids eventually don't know what they're looking for in the app (23 people). 18 people who want a relationship also want kids.
Numbers look a bit more consistent with the people who are not sure about having kids. More than half of them (15 people: 57.69%) also don't know what kind of dynamic they want.
The largest group overall is comprised of people who don't know what they're looking for and didn't specify their preference regarding kids (26 people).

Yet, considering how many people want kids someday (56), it's curious to see how few of them want a relationship at this stage. Maybe it's an age thing? Not wanting to find the co-parent of their future kids here? Then again, the only thing I can do is show patterns and speculate.
6. Crunching numbers
Age and height are the only numerical variables in the dataset. To show these distributions, it made sense to create histograms.
For the age chart, I marked the values outside the range I selected in red—meaning, anyone younger than 25 and older than 36. I also highlighted my own age (28) on the x-axis.​​​​​​​
I didn't filter by height though, so there are no outliers in this variable.
Curiously, the average height (175.95 cm) ended up being really close to my own (175 cm). And most of the people in my pool are between 174 and 178 cm tall.

Going back to question #3:
What's the most frequent and average age and height of the people in my dating pool?
The most frequent values are the mode of each variable.
In this case, 30 years and 180 cm.
The averages are 29.46 years and 175.95 cm.
7. Around the world
There are people from 22* different countries and 49 different cities in my dating pool.

*This answers question #4: from how many different countries are the people in my dating pool?

For the interactive version, click here.

​​​​​​​There are only 4 cities with more than one person:
1. Barcelona, Spain - 29
2. Tokyo, Japan - 13
3. Buenos Aires, Argentina - 5
4. Sabadell, Spain - 3​​​​​​​

The rest of the cities and most of the countries only have 1 person each.
As expected, most people in my data come from Spain. Specifically, from Catalonia. So I made zoomed-in and interactive version of this map.
I also decided to make a zoomed-in version of the Japanese map for better readability.​​​​​​​
Given the language filter, it makes sense that this country comes third in the ranking.

But yeah, my dating pool is quite international. This is Barcelona after all, an amazing, cosmopolitan city in the Mediterranean.
Didn't get a single person from an African country though, it would've been nice to see at least one person from each continent. 😅
8. Hola, hello, こんいちは
Given that my dating pool is quite international, it makes sense that there's also significant linguistic variety. And there is: there are 31 different languages in my dataset.
English is the most spoken one, because of how common it is as a second language.
#2 is Japanese, because of my filter, probably.
And #3 is Spanish, because we're in Spain.

As a foreigner living in Barcelona, I realized that you can perfectly get by without speaking Spanish or Catalan. After all, there's a huge international community in the city and many live here without learning the local languages. As a matter of fact, for my first 6 months here, I spoke more English than Spanish because I hung out with more foreigners than locals.​​​​​​​

The languages in bold are the ones I speak.

All of the 31 languages in the dataset.

There's a widespread saying that there are two Barcelonas: the Catalan one and the international one. I belong to the second one, and in this community, most people I know speak several languages.
After seeing the data, I became curious about 2 things:
1. How many languages are spoken by the people in my dating pool?
2. How are these languages linked?
Or, put in a different way, which combinations of languages do people usually speak?
Most people in my dataset are polyglots who speak 4 languages. The most common combination of languages is Catalan, English, Japanese, and Spanish.
​​​​​​​There are only 2 people who speak one language, and both of them speak only Japanese.
For the language combinations, I created an undirected network with Gephi to visualize the connections. Each connection or edge means that a person speaks both of those languages.

The language network

As you can see, the edges for the most common combination (Catalan - English - Spanish - Japanese) have the most weight. And most languages are connected to either English or Catalan.
To visualize these connections more clearly, I isolated the ego networks* of the 4 most common languages (English, Japanese, Spanish, and Catalan).
I also isolated the Italian network, because it's one of the languages I speak and wasn't as common in the dataset.

*Ego network: a network of neighbors of a node in which all nodes are linked to the main node, and edges between networks are also included. (Aragón, P., 2020).
The English network
The English network
The Japanese network
The Japanese network
The Italian network
The Italian network
The Spanish network
The Spanish network
The Catalan network
The Catalan network
As expected, English is the most common second language.
Catalan is the next most common one, which I found unexpected. The Catalan language has the second most edges in the chart. So I became curious and reviewed this data further.
There are, in total 59 people who speak Catalan in my dataset.
35 of them are from Catalonia.
4 from other Spanish-speaking countries (2 Argentinians, 1 Chilean, and 1 Dominican).
1 from a non-Spanish-speaking country (Italy).
And 19 of them didn't disclose their origin city.

So, in my dating pool, Catalans are not only polyglots, but their third/four/fifth languages are really diverse. There are more than 300 languages spoken in Barcelona*. Unfortunately, I couldn't find an official study or statistic with the percentages of people who speak certain languages, and how many others they speak.
And of course, with this data, it's impossible to know the proficiency of each language. For instance, even though I'm technically a polyglot, I'm not as proficient in Japanese as I am in English or Spanish (my native language). But, for dating app purposes, language proficiency doesn't matter.
So it's safe to conclude that, in Bumble, I live in a polyglot bubble.

9. Beer-drinking cat lady
Now, the most complex variable: interests. Up to five categorical values for each person.

My goals with the interest data were to visualize:
a. The most frequently inputted interests.
b. The relationship between people's interests.
c. The connection between my interests and theirs.

My 5 interests are design, writing, singing, pubs, and cats.

First, I listed all the 117 unique interests I found on my dataset and counted their occurrences. Oddly, only one of mine made it to the top 10.​​​​​​​
After this, I created an undirected network with Gephi to visualize the other 2 goals.
In these graphs:
- The edge thickness and color correlate to the weight of the link.
- The label size correlates to the degree of the node. In other words, the number of edges connected to it.

My interests are marked with a black box.

​​​​​​​Note that the frequency of an interest doesn't necessarily correlate with its degree. The frequency of appearance and connections with other interests are two separate things.
This is something that I found quite interesting. Especially because:
1. Not all of the top 10 are strongly connected with other interests.
2. My interests have a very small presence in the network. Even "pubs", which is #2 in the frequency ranking.

Given this, I decided to isolate the ego networks* of my interests. In these individual networks, the main node is highlighted in black.

*Ego network: a network of neighbors of a node in which all nodes are linked to the main node, and edges between networks are also included. (Aragón, P., 2020).
By creating these visualizations I noticed that:
1. All of my 5 interests are not connected. There are connections between 1 or 2 of them, but not all.
2. Some of the most frequent interests ("photography", "cafe-hopping", "gym", and "cooking") also have the most connections. 
3. Connections are quite varied in nature.
There are some that feel logical, like people liking art, photography, and cafe-hopping. Some others baffle me, like the connection between "cats" and "basketball". And, naturally, "pubs" being connected to "gym" and "yoga" because there must be a balance between fitness and alcohol.
It's fascinating how complex the network of interests is. After all, we humans are diverse and multifaceted.
10. My average Marc
Of course, after organizing and analyzing all this data, I couldn't finish without the average profile: a compilation of the most frequent* variables in each category.
Why Marc? It was the most common male name in Barcelona in the nineties.†​​​​​​​
*In age and height, this means the mode.
† Source: Instituto Nacional de Estadística (2021). Nombres más frecuentes por fecha de nacimiento. (This downloads an Excel file)
11. Quick recap
I asked many questions and I've been answering them all throughout this case study. But, to summarize: 

1. Which of my variables appear the most in my dating pool?
Interests, languages, drinks.

2. What's the ratio between women, men, and non-binary people in my dating pool?
91.33% men, 6.67% women, and 2% non-binary.

3. What's the most frequent and average age and height of the people in my dating pool?
Most frequent: 30 years and 180 cm.
Averages: 29.46 years and 175.95 cm.

4. From how many different countries are the people in my dating pool?

5. What's the predominant zodiac sign in my dating pool?
Cancer. 18 people, 12%.

6. Are there any interesting connections between data points in my dating pool?
Yes, but if I explore ALL of them this case study would never see the light of day. I hope you found the ones I wrote about interesting.
12. In love with data
Given the many data points I've worked with, I could've combined many variables and explored tons of conclusions. I might do it on my own eventually because my curiosity is something to be feared (I mean, I dedicated weeks to something this extraneous), but for this case study, we're done.
It's detailed enough as it is, even though I barely scratched the surface.
If you're curious about more connections, feel free to write and ask (hola@fiichaves.com). Otherwise, I hope you enjoyed this needlessly detailed overview/analysis of the 150 profiles that I happened to browse during 15 summer days.​​​​​​​

Definitely, there are things that could've gone better. I didn't collect all variables: I was missing "lives in", "profession" and "school". I decided to omit them because I wasn't interested in them at the moment, but perhaps I should've decided to discard them after collecting them, not before. After all, without them, I don't know what other insights I would've uncovered.
Also, because of how this website is built, I couldn't add interactive visualizations. But I want to, so that will eventually happen. Interaction is something that will greatly enhance this project.
And finally, I wanted to note how fast we swipe on these apps. For me, it was super time-consuming to get all of these people in a spreadsheet (that's why I stopped at 150), but swiping is a different matter. I would've never come to notice how fast I browse through people if I didn't do this. Or how fast I decide I don't like someone, and how many people I don't like.
I'm not gonna talk about "how better it was in the olden days" because past nostalgia is futile and I enjoy modern ways. Also, I've met some great people and had interesting dates on this app before I got the spreadsheet out. The dynamics and the information to be learned here are fascinating, and I'm glad I scratched the surface.

Next step: get an actual date for city trips and beer 😅
(When I find dating less exhausting)

Special thanks to:
Tomás Bellagamba
Max Escobar
Virginia Soley

For proofreading and reviewing this project so carefully.

More projects

Back to Top