Making Sense of AngelList #1 : Investors

Introduction

A screenshot of AngelList mainpage

A screenshot of AngelList mainpage

AngelList is probably the largest open network of start ups, founders, and investors. It also provides a nice API for others like myself to play with the data. I had some fun analyzing the dataset since January and wanted to put a bit more formality into sharing the result. So I will be organizing the methodologies and results as a series of posts instead of tweets.

Understanding investors has multiple benefits.

  1. One can see the trend in markets. It is important not only for identifying pain points but also pivoting on your existing business or ideas.
  2. Use it to target more relevant investors. Perhaps even a lead investor.
  3. And more…

Methodology

Investors

Investors are filtered from a full list of users who had a “startup role” of “past_investor”.

Primary Locations and Meta Location

  • Investors’ primary location was chosen as the first in the “locations” attribute.
  • Meta location was determined by manually merging primary locations.
  • There may be some inconsistencies or misrepresentation of some investors’ location.

Connections

Connections are drawn by finding the number of co-invested companies between two investors. For example, if “investor 1” and “investor 2” both invested in “company A” and “company B”, there will be a link drawn between them with weight “2.”

The Network

The network of investors with no threshold

Filtered by edge weight, sized by betweenness centrality score, colored by meta locations

Filtered by edge weight, sized by betweenness centrality score, colored by meta locations

Results

Centrality of investors versus followers and number of companies invested

Sized by betweenness centrality score, colored by number of companies invested

Sized by betweenness centrality score, colored by number of companies invested

Scatterplot of betweenness centrality score and number of companies invested

Scatterplot of betweenness centrality score and number of companies invested

Sized by betweenness centrality score, colored by number of followers

Sized by betweenness centrality score, colored by number of followers

Scatterplot of betweenness centrality score and number of followers

Scatterplot of betweenness centrality score and number of followers


Both number of followers and number of companies invested have some correlation with betweenness centrality score. One with number of companies invested is expected since the network was generated using the co-investments.

Giant cluster of Silicon Valley investors

Closer look at the central cluster of Silicon Valley investors

Closer look at the central cluster of Silicon Valley investors

I don’t know whether AngelList data is skewed toward Silicon Valley investors or many investors list SV as a primary location even if they don’t live there but SV investors take large majority and they are very central. They are well-connected to pretty much every group and co-mingled with the second largest group, NYC/Boston investors(teal color).

David McClure and 500 Startups because of their number of investments, have the highest betweenness centrality scores and pretty much all other centrality measures.

Investors in within Silicon Valley region

Investors in within Silicon Valley region

Within Silicon Valley, there is no distinct sub-groups based on smaller regions.

Silicon Valley investors acting as hubs

Brad Holden, a Silicon Valley investor is positioned to connect many Los Angeles based investors.

Brad Holden, a Silicon Valley investor is positioned to connect many Los Angeles based investors.

There are many examples of SV investors acting as hubs to other regional groups of investors. The most prominent one is Brad Holden(bottom right) who is connecting a very well-connected group of Los Angeles investors.

Joshua Baer is a Texas based investor who is connecting many investors in the same region.

Joshua Baer is a Texas based investor who is connecting many investors in the same region.

Another example is an investor who is based in a region outside of Silicon Valley but has made many investments with SV investors acting as a hub to regional investors. Joshua Baer(top center) and Bill Boebel are both are based in Texas but have many co-investment connections with SV investors are connecting other Texas based investors.

Ideas for Further Analysis

I wish I was able to get some temporal information to do more advanced analysis such as

  • A group of investors acting as flocks – How does certain attributes of investors inform/motivate other investors to act together?
  • How does information about startups disperse between investors?

Shout Out

  • Babak Nivi(@nivi) for suggestion of ideas.
  • Joshua Slayton(@joshuaxls) for answering questions and accommodating additional data requests.

10 Comments

  1. Finding the people that can connect you with different hubs (like Joshua Baer) is very cool and potentially useful.

  2. Excellent. Keep’em coming.

  3. > Ideas for Further Analysis with temporal information

    I hear you because traditional SNA/ONA analysis like with centrality and betweenness only reveals so much…

    Dynamic temporal information is more valuable because relationships continuously change, which impacts the direction of information flow so if you can capture the context of conversations you can make predictions on the ever-changing dynamics happening within a network e.g. topics based on changing interests and how that translates into identifying the right key influencers at any given time. 

  4. Very interesting! Thank you for sharing this..keep it coming!

  5. Hey Sol — very cool analysis, thanks for pulling this together. Have you made this model public/navigable by others? I ask b/c I’m focused on building deeper + broader connections between Silicon Valley + the Pacific Northwest (especially SEA + PDX) so was glad to see my name on there but would love to know more about the other “human switches” currently making that happen. (I’m crashdev at crashdev dot com if you want to rap offline).

    • Chris, I haven’t made the model public yet.
      I am planning to do that towards the end of the series. I would be interested to hear your methodology as well. =)

  6. @soleun: Those of you who wants to see a larger version of @angellist investor network, I’ve created a seadragon export. https://t.co/ecXC86Vufl

  7. What is that cluster at 4 o’clock in that first ‘no threshold’ image? Is it NYC?

    • Michael, that area is quite interesting. It is hard to point out they are from one location because the cluster is a mix of SV, NYC, Europe and other regions.
      I think it is worth investigating. Thanks for pointing it out.