# Find elements that are not identical in two dataframes
# Useful link: http://stackoverflow.com/questions/3171426/compare-two-data-frames-to-find-the-rows-in-data-frame-1-that-are-not-present-in
library(xlsx)
library(proto)
library(sqldf)
library(compare)
x <- c(1,2,3,4,5)
y1 <- c(1,2,3,4,5)
y2 <- c(1,2,3,4,6)
d1 <- data.frame(x, y1)
d2 <- data.frame(x, y2)
compare(d1,d2)
# SQLDF solution
a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3])
require(sqldf)
a1NotIna2 <- sqldf('SELECT * FROM a1 EXCEPT SELECT * FROM a2')
a1Ina2 <- sqldf('SELECT * FROM a1 INTERSECT SELECT * FROM a2')
# DPLYR solution
library(dplyr)
anti_join(a1,a2)
semi_join(a1,a2)
full_join(a1,a2)
(8.6 update) To only focus on the columns containing different elements instead of viewing a bunch of identical columns (such as ages, education, you know they will all be the same and less likely to be different), the trick is to add an extra row below the original dataframe which is a logical value, indicating if the two columns are identical. Can use idential() function in R to do that. Then, it is just a matter of dataframe subsetting.
Pretty cool. Huh?
沒有留言:
張貼留言