R可视化:桑基图

2015/04/09

Categories: R Tags: 可视化 桑基图 plot

桑基图(Sankey diagram),即桑基能量分流图,也叫桑基能量平衡图。它是一种特定类型的流程图,图中延伸的分支的宽度对应数据流量的大小,通常应用于能源、材料成分、金融等数据的可视化分析。因1898年Matthew Henry Phineas Riall Sankey绘制的“蒸汽机的能源效率图”而闻名,此后便以其名字命名为“桑基图”。桑基图可以用来表示各个节点之间转换(转化率、客流走向/分流等情况)

在R中可以直接定义使用data.frame/list等数据结构定义节点之间的关系,再使用相应的package画桑基图

静态图片展示:riverplot

library(riverplot)

# 构造连接节点的数据框
edges = data.frame(
  N1 = paste0(rep(LETTERS[1:4], each = 4), rep(1:5, each = 16)),
  N2 = paste0(rep(LETTERS[1:4], 4), rep(2:6, each = 16)),
  Value = runif(80, min = 2, max = 5) * rep(c(1, 0.8, 0.6, 0.4, 0.3), each = 16),
  stringsAsFactors = F
)
# 筛选80%的记录,以免每个点都对应到4个点
edges = edges[sample(c(TRUE, FALSE),
                     nrow(edges),
                     replace = TRUE,
                     prob = c(0.8, 0.2)), ]
head(edges)
# N1 N2 Value
# 1 A1 A2 2.147
# 2 A1 B2 4.726
# 3 A1 C2 4.755
# 4 A1 D2 2.442
# 5 B1 A2 3.191
# 6 B1 B2 3.146

nodes = data.frame(ID = unique(c(edges$N1, edges$N2)), stringsAsFactors = FALSE)
#
nodes$x = as.integer(substr(nodes$ID, 2, 2))
nodes$y = as.integer(sapply(substr(nodes$ID, 1, 1), charToRaw)) - 65
#
rownames(nodes) = nodes$ID
head(nodes)
#    ID x y
# A1 A1 1 0
# B1 B1 1 1
# C1 C1 1 2
# D1 D1 1 3
# A2 A2 2 0
# B2 B2 2 1

# 添加颜色
library(RColorBrewer)
# 后面加调淡颜色
palette = paste0(brewer.pal(4, "Set1"), "60")

# 对每个节点生成相应的格式
styles = lapply(nodes$y, function(n) {
  list(col = palette[n + 1],
       lty = 0,
       textcol = "black")
})
names(styles) = nodes$ID

# 以list的结构保存一遍调用
rp <- list(nodes = nodes,
           edges = edges,
           styles = styles)
class(rp) <- c(class(rp), "riverplot")
plot(rp, plot_area = 0.95, yscale = 0.06)

1520168425975

使用包d3Network或者circlize生成交互式图形

d3Network是调用D3的画图功能来实现,实现代码示例:

library(d3Network)
d3links <- edges
d3nodes <-
  data.frame(name = unique(c(edges$N1, edges$N2)), stringsAsFactors = FALSE)
d3nodes$seq <- 0:(nrow(d3nodes) - 1)

d3links <- merge(d3links, d3nodes, by.x = "N1", by.y = "name")
names(d3links)[4] <- "source"

d3links <- merge(d3links, d3nodes, by.x = "N2", by.y = "name")
names(d3links)[5] <- "target"
names(d3links)[3] <- "value"

d3links <- subset(d3links, select = c("source", "target", "value"))
d3nodes <- subset(d3nodes, select = c("name"))

# 画图并保存为html文件
d3Sankey(
  Links = d3links,
  Nodes = d3nodes,
  Source = "source",
  Target = "target",
  Value = "value",
  NodeID = "name",
  fontsize = 12,
  nodeWidth = 30,
  file = "TestSankey.html"
)  

1520168379760


注: