Programming Languages As a Social Network

	Author:	“No Bugs” Hare Follow:
	Job Title:	Sarcastic Architect
	Hobbies:	Thinking Aloud, Arguing with Managers, Annoying HRs, Calling a Spade a Spade, Keeping Tongue in Cheek

The Idea

Recently, I was thinking about visualizing relations among different programming languages, and a thought has crossed my mind:

what if we consider programming languages as a kinda social network and visualize them as such a network?

Methodology

I took quite a few more-or-less popular programming languages (33 to be exact); however, I explicitly restricted myself to more-or-less general-purpose programming languages. This eliminated DSLs such as R, as well as all dialects of SQL, HTML, CSS, and MATLAB.
For each language, I took a sum of normalized weights from three sources: [TIOBE][Stack Overflow][IEEE], and took it as a “weight” of the programming language.
To consider programming languages as a kinda social network, a metric is needed to express their inter-relations. I decided to use Google search on (a) “<language1> vs <language2>” and “<language2> vs <language1>” (quoted!), add numbers of returned pages as reported by Google, and consider this as a metric for inter-relation between two languages (NB: any requests without quotes grabbed too much garbage – such as pages discussing Dudley C Haskell when requesting unquoted C Haskell).
After initial data was gathered, I took R (which ironically was eliminated from the analysis as discussed above), and built a graph, with languages being vertices (with weights), and inter-relation data from Google being weights of the edges of the graph.
Then, I used R’s igraph package to visulaize the graph, using its Fruchterman-Reingold algorightm (pretty much standard for this kind of visualisations) to draw the graph.
- NB: as with any such visualization, the result is inherently random, so different pictures are possible based on the same data. I experimented a bit and took the picture which I considered more visually appealing.
All the raw data and programs used to visualize are available, and I am going to publish it soon too.

The result is shown on the picture above.

Sanity Check

It is interesting to note that even such a simple result (which did NOT use any a priori information about the nature of the languages), did show quite a few commonalities as we could expect based on intrinsic knowledge about the languages; in particular, the following intuitively-expected clusters can be seen:

C-C++-Rust (low-level languages w/o GC)
- A looser cluster of Delphi-asm-Lua-C-C++-Rust (embedded)
C#-Java (Garbage-Collected statically-typed)
Python-JS-PHP (dynamically-typed somewhat-web-related)
Objective-C/Swift (Apple)
Elixir-Erlang-Haskell-Scala-Clojure-F#-OCaml (mostly-functional)
Racket-Scheme-LISP-Clojure (LISP-like)

Overall, I’d say that in spite of original data being very generic and without any knowledge about the languages as such, results do look reasonably sane to me.

Conclusion

We considered an unorthodox way to visualize programming languages and their inter-relations; we also cross-checked that it makes sense given our intrinsic knowledge about the languages involved, and it does look ok. What can be derived from such a visualization – is yet to be seen; for now – let’s just enjoy the view…

[+]References

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.

Comments

Ross Smith says

January 31, 2019 at 8:39 pm

To measure the relationship between two languages, simply taking the raw numbers of “lang1 vs lang2” hits biases the results in favour of popular languages. For example, look how thin the connection between Lisp and Scheme appears to be, compared to the much heavier link between C and C#, where I would say Lisp and Scheme are at least as closely related as C and C#. It might be better to look at the ratio between hits on “lang1 vs lang2” to individual hits on each of the two languages alone. (Perhaps the denominator should be the geometric mean of the two languages’ counts?)

- "No Bugs" Hare says
  
  February 2, 2019 at 8:54 am
  
  Of course, there is at least a million different ways to normalize this kind of stuff (and to visualize it too), so arguing that mine one is “The Best One” would be outright silly; OTOH, I could argue that for less popular languages, which are farther from the mainstream, relative weight of links has to decline to produce a meaningful overall picture (very roughly – I’d expect them to decline as 1/R^2 where R is distance from the center). In other words, I could say that we shouldn’t try to scale the picture, but to use it only to see “hey, the closest language to Scheme is Lisp” – and this is not too far from the truth.
  
  Overall, the whole thing can be judged only on the scale of “whether the final picture as a whole makes sense” – and IMO current one does surprisingly well in this regard (we did NOT use any inherent knowledge that C and C# are close – or that Lisp and Scheme are close, but it still emerged from purely statistical data from Google).