Comparing 2 Columns

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Comparing 2 Columns

bill-2
HI all,
 
I have 2 columns that mostly contain data common to both. I would like to remove this leaving only the data unique to 1 column. Any thoughts on how best to accomplish this? Many thanks!
 
peace
bill d

_______________________________________________
gnumeric-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gnumeric-list
Reply | Threaded
Open this post in threaded view
|

Re: Comparing 2 Columns

Morten Welinder-2
I don't think there is an easy spreadsheet solution to this, but if
you have the data in sorted text files,
the command line "comm -23 filea fileb" will do the job.

M.
_______________________________________________
gnumeric-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gnumeric-list
Reply | Threaded
Open this post in threaded view
|

Re: Comparing 2 Columns

Tim Chase
In reply to this post by bill-2
On 2020-08-16 22:59, bill d wrote:
> I have 2 columns that mostly contain data common to both. I would
> like to remove this leaving only the data unique to 1 column. Any
> thoughts on how best to accomplish this? Many thanks! peace

It might help to have a better picture of your data.  Are they two
columns of data in common with a 3rd column such as

  Month  2019  2020
  Jan    4     4
  Feb    4     5
  Mar    8     8

and you just want the "Feb" row?  Or do you have two columns of
values

  1    2
  3    3
  4    5
  2    1

and you want 5 rows of their unique values:

  1
  2
  3
  4
  5

Do you need to maintain the source order?  Are they some sort of text
files that you could use other *nix tools rather than trying to do it
in Gnumeric?

With a better idea of what your data looks like and what you want the
results to look like, it would help craft a solution.

-tim




_______________________________________________
gnumeric-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gnumeric-list
Reply | Threaded
Open this post in threaded view
|

Re: Comparing 2 Columns

GnuMeric mailing list
In reply to this post by bill-2
On 8/16/20 1:59 PM, bill d wrote:

> I have 2 columns that mostly contain data common to both. I would like to remove
> this leaving only the data unique to 1 column. Any thoughts on how best to
> accomplish this?

I doubt there is any reasonable way to do it within spreadsheet-land.

It can be done easily in an imperative programming language; see below.

Suppose the input is:

red
green white
black gray
blue black
white cyan
        magenta
        yellow

then the output is

                black
blue
        cyan
        gray
green
        magenta
red
                white
        yellow

where
   the first  output column is unique to the first  input column,
   the second output column is unique to the second input column, and
   the third  output column is common to both.

NOTE: You may find it convenient to save the output to a temporary .csv
file, then add that onto the end of your working file (using ssconvert
--merge-to ...), then copy-and-paste from the added sheet to wherever
the result actually belongs.  Then delete the added sheet.  I find this
to be less laborious and less error-prone than other ways of inserting
data.

===============================================
I call this 'csv-comm.pl'.
It is conceptually similar to the unix 'comm' command ... although it
takes inputs from a /single/ file, and does /not/ require them to be
sorted.

#! /usr/bin/perl -CS

use warnings;
use strict;
use Text::CSV 'csv';
use File::Temp;

main: {
  my $ifile = 'csv-comm.gnumeric';
  my $sheet = 'Sheet1';
  my $dir = File::Temp->newdir(TEMPLATE => "temp-XXXXXXXX");
  system "ssconvert -S $ifile $dir/csv-comm-%s.csv";
  my $aoa = csv(in => "$dir/csv-comm-$sheet.csv");
  my %rslt;
  for my $col (0, 1) {
    my $flag = 1<<$col;
    checker: for my $row ($aoa->@*) {
      my $val = $row->[$col];
      next checker if $val eq '';
      $rslt{$val} += $flag;
    }
  }
  for my $val (sort keys %rslt) {
    my $which = $rslt{$val};
    my @out = ('') x 4;
    $out[$which-1] = $val;
    print join(',', @out), "\n";
  }
}
_______________________________________________
gnumeric-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gnumeric-list
Reply | Threaded
Open this post in threaded view
|

Re: Comparing 2 Columns

GnuMeric mailing list
Another version of the program.  Output is more dense, less sparse.

Input is:

red
green white
black gray
blue black
white cyan
        magenta
        yellow

Output is:

blue cyan black
green gray white
red magenta
        yellow


#! /usr/bin/perl -CS

use warnings;
use strict;
use Text::CSV 'csv';
use File::Temp;
use List::Util 'max';

main: {
  my $ifile = 'csv-comm.gnumeric';
  my $sheet = 'Sheet1';
  my $dir = File::Temp->newdir(TEMPLATE => "temp-XXXXXXXX");
  system "ssconvert -S $ifile $dir/csv-comm-%s.csv";
  my $aoa = csv(in => "$dir/csv-comm-$sheet.csv");
  my %rslt;
  for my $col (0, 1) {
    my $flag = 1<<$col;
    checker: for my $row ($aoa->@*) {
      my $val = $row->[$col];
      next checker if $val eq '';
      $rslt{$val} += $flag;
    }
  }
  my @dense = ();       ## an array of array-references
  for my $val (sort keys %rslt) {
    my $which = $rslt{$val};
    push $dense[$which-1]->@*, $val;
  }
  my $top = max(map {0+$_->@*} @dense);
  for (my $row = 0; $row < $top; $row++) {
    for (my $col = 0; $col < 3; $col++) {
      print shift($dense[$col]->@*) // '', ",";
    }
    print "\n";
  }
}
_______________________________________________
gnumeric-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gnumeric-list