Monday, June 29, 2015

Tabular and Visual Representations of Data using Neo4J

Corporate and Employee Relationships
Both Graphical and Tabular Results

So, there are many ways to view data, and people may have different needs for representing that data, either for visualization (in a graph:node-edges-view) or for tabulation/sorting (in your standard spreadsheet view).

So, can Neo4J cater to both these needs?

Yes, it can.

Scenario 1: Relationships of owners of multiple companies

Let's say I'm doing some data exploration, and I wish to know who has interest/ownership in multiple companies? Why? Well, let's say I'm interested in the Peter-Paul problem: I want to know if Joe, who owns company X is paying company Y for whatever artificial scheme to inflate or to deflate the numbers of either business and therefore profit illegally thereby.

Piece of cake. Neo4J, please show me the owners, sorted by the number of companies owned:

MATCH (o:OWNER)--(p:PERSON)-[r:OWNS]->(c:CORP)
RETURN p.ssn AS Owner, collect(c.name) as Companies, count(r) as Count 
ORDER BY Count DESC


Diagram 1: Owners by Company Ownership

Boom! There you go. Granted, this isn't a very exciting data set, as I did not have many owners owning multiple companies, but there you go.

What does it look like as a graph, however?

MATCH (o:OWNER)--(p:PERSON)-[r:OWNS]->(c:CORP)-[:EMPLOYS]->(p1) 
WHERE p.ssn in [2879,815,239,5879] 
RETURN o,p,c,p1


Diagram 2: Some companies with multiple owners

To me, this is a richer result, because it now shows that owners of more than one company sometimes own shares in companies that have multiple owners. This may yield interesting results when investigating associates who own companies related to you. This was something I didn't see in the tabular result.

Not a weakness of Neo4J: it was a weakness on my part doing the tabular query. I wasn't looking for this result in my query, so the table doesn't show it.

Tellingly, the graph does.

Scenario 2: Contract-relationships of companies 

Let's explore a different path. I wish to know, by company, the contractual-relationships between companies, sorted by companies with the most contractual-relationships on down. How do I do that in Neo4J?

MATCH (c:CORP)-[cc:CONTRACTS]->(c1:CORP) 
RETURN c.name as Contractor, collect(c1.name) as Contractees, count(cc) as Count 
ORDER BY Count DESC


Diagram 3: Contractual-Relationships between companies

This is somewhat more fruitful, it seems. Let's, then, put this up into the graph-view, looking at the top contractor:

MATCH (p:PERSON)--(c:CORP)-[:CONTRACTS*1..2]->(c1:CORP)--(p1:PERSON) 
WHERE c.name in ['YFB'] 
RETURN p,c,c1,p1


Diagram 4: Contractual-Relationships of YFB

Looking at YFB, we can see contractual-relationships 'blossom-out' from it, as it were, and this is just immediate, then distance 1 from that out! If we go out even just distance 1 more in the contracts, the screen fills with employees, so then, again, you have the forest-trees problem where too much data is hiding useful results with data.

Let's prune these trees, then. Do circular relations appear?

MATCH (c:CORP)-[:CONTRACTS*1..5]->(c1:CORP) WHERE c.name in ['YFB'] RETURN c,c1


Diagram 5: Circular Relationship found, but not in YFB! Huh!

Well, would you look at that. This shows the power of the visualization aspect of graph databases. I was examining a hot-spot in corporate trades, YFB, looking for irregularities there. I didn't find any, but as I probed there, a circularity did surface in downstream, unrelated companies: the obvious one being between AZB and MZB, but there's also a circular-relationship that becomes apparent starting with 4ZB, as well. Yes, this particular graph is noisy, but it did materialize an interesting area to explore that may very well have been overlooked with legacy methods of investigation.

Graph Databases.


BAM.

No comments: