US Investment Activity by State from 2001

In order to understand the investment activity for private companies in US, I created an interactive map to visualize the number of investment activities(size) and whether the number of activities increased or decreased(color) using CrunchBase data. The number next to state code shows the funding round count for given year. All calculations are based on funding round count not the amount.

Large View

Here are some quick observations. If it is too hard to follow, please click “Large View” link above to open the map in a separate tap/window.

  • The data shows that there was a big jump of funding activities in 2005 compared to 2004 across US. In 2005, companies in California raised 416 rounds compared to 73 in 2004. That’s 469% increase in just one year. During same period, the number of investment rounds for companies in Massachusetts jumped from 18 to 93, a 416% increase. However, NVCA’s yearbook (Link) does not suggest that such jump existed so it may simply because TechCrunch started in 2005 (Link) so they do not have as much coverage before 2005.
  • In 2008, New York passes Massachusetts in terms of number of investment activities and it stays until 2013.
  • You will see the beginning of recession in 2008 but both California and New York raised more rounds than they did in 2007. In 2009, the recession continues and it is the only year in which California’s number decreases since 2002.
  • In 2011, both Illinois and Ohio companies doubles number of investment activities. For Illinois, it could be due to a state-wide effort to encourage startups such as the launch of Startup Illinois (link). Although I could not find a similar state-wide initiative, there were online articles mentioning school and community-wide activities (Link).
  • 2013 shows many east coast states getting higher traction compared to west coast states with North Carolina, Maryland, and Connecticut closing more than double the number from previous year. Although one year is just too short to confirm the trend, it will be interesting to see whether this behavior continues in 2014.
  • Please share your own observations in comment section.
  • Thanks to CrunchBase for awesome dataset and I love the new look. You should check it out. (Link)

Boston Startup Scene and Job Search Using Open Data


I am moving to Boston, MA in June.

Although there are many aspects of living in a new city I am looking forward to, job search isn’t one that I am excited about. Not so much because it is not fun to be the “new guy,” but so much for the time I have to invest to find “the company” to work at.

Sometimes, doing things the hard way pays off at the end. So I wanted to use the data to analyze and to understand what Boston’s startup scene is like with some hope to narrow down my search for interesting companies.


  • The data being used for the analysis was gathered on March 10th, 2014. So any analysis including 2014 Q1 information is NOT complete.
  • The companies, funding, people data are from CrunchBase and AngelList. But they are not disambiguated so some double counting could be presented. My intention is to present patterns rather than to present the exact numbers to be quoted.
  • Network graphs can be browsed by dragging(to pan) and scrolling(to zoom). You can also click on any node to highlight its neighbor nodes.
  • Some companies in other industries such as health care may be under represented since those companies are less likely to appear in CrunchBase or AngelList.

Boston Startup Scene


Large View

Market Relatedness Network (AngelList)
Nodes (250) : Market tags from AngelList
Edges (575) : Companies shared by two market tags
Color : Modularity, sub-networks/communities of markets
Size : # of Boston-based AngelList companies has the particular tag

Boston startups largely consist of mobile, social media, saas, and e-commerce markets with some distinct groups of gaming, healthcare, travel and robotics.
Many of these large markets are also present in market relatedness networks of other cities but distribution of hardware/robotics and healthcare markets are larger than other cities’. Given many academic and medical institutions in Boston area, this result makes a lot of sense.


Company Count by Founding Year
* AngelList doesn't have founding year attribute, so I used "created_at" to see the # of company profiles added.

The decrease in last two years are probably due to the fact that companies are added to CrunchBase based on major funding announcement. Looking at CrunchBase data, the number of companies founded in Boston is steadily increasing through 2012. AngelList data shows many startups are founded in Boston area but it may have to do with older companies being added to AngelList platform. Comparing with some other major cities such as San Francisco and New York, the decline in 2013 is not just for Boston-based companies. It will be interesting to see whether this downward trend continues throughout 2014.

Company Count by Top Tags(AngelList)
Company Count by Top Tags(CrunchBase)

As market relatedness network showed, mobile, social media, and saas are top tags on both charts. AngelList chart shows that education and health care companies are being founded and CrunchBase chart reveals analytics companies are also trending.

Large View

Company Relatedness Network (AngelList)
Nodes (745) : Boston-based companies from AngelList
Edges (2706) : Market tags shared by two companies
Color : Modularity, sub-networks/communities of companies
Size : Follower count for each company

People’s Skills vs Job Skills


Most skills for developers are around mobile and web development. The ratio of all skills between developers and development jobs range from 1:5 to 1:2 except for ios development and objective c which are in very high demand. If you are an ios developer, the data shows that you will have a better chance of finding a job. =)

Designer & PM

It is interesting to see that many development skills are being required for designers and some PM jobs.

Large View

People Skill Network (AngelList)
Nodes (200) : Top 200 people in Boston based on betweenness centrality score
Edges (5327) : Skill similarity between people
Color : Modularity
Size : Betweenness centrality score

This is more or less for fun. Even filtering people with more than 3 skills, there are over 2200 people in Boston. Since Sigma.js is not capable of showing a network of that size, I filtered the network by betweenness centrality score. Coloring by modularity, it reveals sub-networks of developers(pink), product managers(green) and everyone else(blue).


Large View

Co-investment Investor Network (CrunchBase)
Nodes (65) : Investors(both institutional and individual) invested Boston-based startup
Edges (135) : Companies co-invested (more than 2)
Color : Type of investor, either institutional or individual
Size : # of funding rounds participated in Boston-based companies

5 Most Active Investors in Boston on AngelList

Selected based on 2013 funding round count

5 Most Active Investors in Boston on CrunchBase

Selected based on 2013 funding round count

Keep in mind that some rounds of investment maybe double counted. This is more important to see the increasing pattern of amount. The spike in 2014 Q1 from CrunchBase data is due to a massive funding round($600M) of an energy company, Cape Wind.

Search for a Skill Fit

Large View

Company Relatedness Network (AngelList)
Nodes (209) : Boston-based companies
Edges (744) : Connected by number of employees with similar skills
Color : Relevancy. More relevant to me(Green), less but still relevant(Blue)
Size : Relevancy. Nodes directly connected to me is slightly larger

Here is an attempt to find a short list of companies based on employees who has similar skillset as me. I don’t know much about these companies but many of them seem to be in marketing and analytics. Even though I am not limiting my search solely based on these companies, it is a good starting point.

What’s Next?

  • Does the downward trend continues in 2014? What is causing it?
  • TF-IDF normalization on tags to compare with different cities’ data.
  • Disambiguate companies, people, and other documents
  • More cities.
  • Add more data sources such as LinkedIn, GitHub, etc.


  • W. Bryan Smith(LinkedIn and Twitter @questgen) for brainstorming ideas and feedback.
  • Joshua Slayton(Twitter @joshuaxls) for AngelList data request

Visualizing 350 million people movement in US

A screenshot from 48 states animation (during the time of hurricane Katrina)

I worked on visualizing US address change records for last two months and my work has been published in company’s blog.
I originally started with getting high level pictures from Gephi but realized that Gephi wasn’t quite suited for visualizing what I wanted to convey. So I wrote a program in Java using processing library to gain finer control over some visual primitives for coloring, sizing and animation.

So head over and take a look at the post by clicking the link below.

Making Sense of AngelList #1 : Investors


A screenshot of AngelList mainpage

A screenshot of AngelList mainpage

AngelList is probably the largest open network of start ups, founders, and investors. It also provides a nice API for others like myself to play with the data. I had some fun analyzing the dataset since January and wanted to put a bit more formality into sharing the result. So I will be organizing the methodologies and results as a series of posts instead of tweets.

Understanding investors has multiple benefits.

  1. One can see the trend in markets. It is important not only for identifying pain points but also pivoting on your existing business or ideas.
  2. Use it to target more relevant investors. Perhaps even a lead investor.
  3. And more…



Investors are filtered from a full list of users who had a “startup role” of “past_investor”.

Primary Locations and Meta Location

  • Investors’ primary location was chosen as the first in the “locations” attribute.
  • Meta location was determined by manually merging primary locations.
  • There may be some inconsistencies or misrepresentation of some investors’ location.


Connections are drawn by finding the number of co-invested companies between two investors. For example, if “investor 1” and “investor 2” both invested in “company A” and “company B”, there will be a link drawn between them with weight “2.”

The Network

The network of investors with no threshold

Filtered by edge weight, sized by betweenness centrality score, colored by meta locations

Filtered by edge weight, sized by betweenness centrality score, colored by meta locations


Centrality of investors versus followers and number of companies invested

Sized by betweenness centrality score, colored by number of companies invested

Sized by betweenness centrality score, colored by number of companies invested

Scatterplot of betweenness centrality score and number of companies invested

Scatterplot of betweenness centrality score and number of companies invested

Sized by betweenness centrality score, colored by number of followers

Sized by betweenness centrality score, colored by number of followers

Scatterplot of betweenness centrality score and number of followers

Scatterplot of betweenness centrality score and number of followers

Both number of followers and number of companies invested have some correlation with betweenness centrality score. One with number of companies invested is expected since the network was generated using the co-investments.

Giant cluster of Silicon Valley investors

Closer look at the central cluster of Silicon Valley investors

Closer look at the central cluster of Silicon Valley investors

I don’t know whether AngelList data is skewed toward Silicon Valley investors or many investors list SV as a primary location even if they don’t live there but SV investors take large majority and they are very central. They are well-connected to pretty much every group and co-mingled with the second largest group, NYC/Boston investors(teal color).

David McClure and 500 Startups because of their number of investments, have the highest betweenness centrality scores and pretty much all other centrality measures.

Investors in within Silicon Valley region

Investors in within Silicon Valley region

Within Silicon Valley, there is no distinct sub-groups based on smaller regions.

Silicon Valley investors acting as hubs

Brad Holden, a Silicon Valley investor is positioned to connect many Los Angeles based investors.

Brad Holden, a Silicon Valley investor is positioned to connect many Los Angeles based investors.

There are many examples of SV investors acting as hubs to other regional groups of investors. The most prominent one is Brad Holden(bottom right) who is connecting a very well-connected group of Los Angeles investors.

Joshua Baer is a Texas based investor who is connecting many investors in the same region.

Joshua Baer is a Texas based investor who is connecting many investors in the same region.

Another example is an investor who is based in a region outside of Silicon Valley but has made many investments with SV investors acting as a hub to regional investors. Joshua Baer(top center) and Bill Boebel are both are based in Texas but have many co-investment connections with SV investors are connecting other Texas based investors.

Ideas for Further Analysis

I wish I was able to get some temporal information to do more advanced analysis such as

  • A group of investors acting as flocks – How does certain attributes of investors inform/motivate other investors to act together?
  • How does information about startups disperse between investors?

Shout Out

  • Babak Nivi(@nivi) for suggestion of ideas.
  • Joshua Slayton(@joshuaxls) for answering questions and accommodating additional data requests.

Thank you friends.

I want to thank everyone who was kind enough to make introductions to companies/people for opportunities.
After talking to various companies, I am joining Spokeo as a data scientist to analyze their dataset.

I intentionally didn’t find a job after my last employment(well, at least partially). It was a great experiment for me to try breaking out of a financial comfort zone. I wouldn’t say that I succeeded but it was good enough to make me realize that “what I do STILL defines a large part of my identity”.

I hope to try this again and someday, I will learn how to “embrace the uncertainty!”

Again, thank you everyone for this valuable lesson. =)