Introduction
AngelList is probably the largest open network of start ups, founders, and investors. It also provides a nice API for others like myself to play with the data. I had some fun analyzing the dataset since January and wanted to put a bit more formality into sharing the result. So I will be organizing the methodologies and results as a series of posts instead of tweets.
Understanding investors has multiple benefits.
- One can see the trend in markets. It is important not only for identifying pain points but also pivoting on your existing business or ideas.
- Use it to target more relevant investors. Perhaps even a lead investor.
- And more…
Methodology
Investors
Investors are filtered from a full list of users who had a “startup role” of “past_investor”.
Primary Locations and Meta Location
- Investors’ primary location was chosen as the first in the “locations” attribute.
- Meta location was determined by manually merging primary locations.
- There may be some inconsistencies or misrepresentation of some investors’ location.
Connections
Connections are drawn by finding the number of co-invested companies between two investors. For example, if “investor 1” and “investor 2” both invested in “company A” and “company B”, there will be a link drawn between them with weight “2.”
The Network
Results
Centrality of investors versus followers and number of companies invested
Both number of followers and number of companies invested have some correlation with betweenness centrality score. One with number of companies invested is expected since the network was generated using the co-investments.
Giant cluster of Silicon Valley investors
I don’t know whether AngelList data is skewed toward Silicon Valley investors or many investors list SV as a primary location even if they don’t live there but SV investors take large majority and they are very central. They are well-connected to pretty much every group and co-mingled with the second largest group, NYC/Boston investors(teal color).
David McClure and 500 Startups because of their number of investments, have the highest betweenness centrality scores and pretty much all other centrality measures.
Within Silicon Valley, there is no distinct sub-groups based on smaller regions.
Silicon Valley investors acting as hubs
There are many examples of SV investors acting as hubs to other regional groups of investors. The most prominent one is Brad Holden(bottom right) who is connecting a very well-connected group of Los Angeles investors.
Another example is an investor who is based in a region outside of Silicon Valley but has made many investments with SV investors acting as a hub to regional investors. Joshua Baer(top center) and Bill Boebel are both are based in Texas but have many co-investment connections with SV investors are connecting other Texas based investors.
Ideas for Further Analysis
I wish I was able to get some temporal information to do more advanced analysis such as
- A group of investors acting as flocks – How does certain attributes of investors inform/motivate other investors to act together?
- How does information about startups disperse between investors?
Shout Out
- Babak Nivi(@nivi) for suggestion of ideas.
- Joshua Slayton(@joshuaxls) for answering questions and accommodating additional data requests.
March 17, 2013 at 11:42 am
Finding the people that can connect you with different hubs (like Joshua Baer) is very cool and potentially useful.
March 17, 2013 at 11:48 am
I agree. Temporal information could help build prediction model as well.
March 17, 2013 at 11:47 am
Excellent. Keep’em coming.
March 17, 2013 at 2:32 pm
> Ideas for Further Analysis with temporal information
I hear you because traditional SNA/ONA analysis like with centrality and betweenness only reveals so much…
Dynamic temporal information is more valuable because relationships continuously change, which impacts the direction of information flow so if you can capture the context of conversations you can make predictions on the ever-changing dynamics happening within a network e.g. topics based on changing interests and how that translates into identifying the right key influencers at any given time.
March 18, 2013 at 6:04 am
Very interesting! Thank you for sharing this..keep it coming!
March 18, 2013 at 10:52 am
Hey Sol — very cool analysis, thanks for pulling this together. Have you made this model public/navigable by others? I ask b/c I’m focused on building deeper + broader connections between Silicon Valley + the Pacific Northwest (especially SEA + PDX) so was glad to see my name on there but would love to know more about the other “human switches” currently making that happen. (I’m crashdev at crashdev dot com if you want to rap offline).
March 19, 2013 at 8:16 am
Chris, I haven’t made the model public yet.
I am planning to do that towards the end of the series. I would be interested to hear your methodology as well. =)
March 18, 2013 at 11:17 am
@soleun: Those of you who wants to see a larger version of @angellist investor network, I’ve created a seadragon export. https://t.co/ecXC86Vufl
March 18, 2013 at 11:29 am
What is that cluster at 4 o’clock in that first ‘no threshold’ image? Is it NYC?
March 19, 2013 at 8:24 am
Michael, that area is quite interesting. It is hard to point out they are from one location because the cluster is a mix of SV, NYC, Europe and other regions.
I think it is worth investigating. Thanks for pointing it out.