Re: Re: Comparing 2 Columns

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Re: Comparing 2 Columns

bill-2
They are text. Specifically, 1 column contains names of moth species known from my state. The second column has names of moth species known from an adjoining state. Most will be present in both states, which doesn't interest me. Some will be listed in one state or the other, but not both. This is the data i am trying to isolate. While i can do this manually, we are talking some 1,900 and 1,700 names. Further, this will surely come up again in future so I'd like a quicker solution. Many thanks for any suggestions!

--
peace
bill
On 8/16/20, 6:56 PM Tim Chase <[hidden email]> wrote:
On 2020-08-16 22:59, bill d wrote:
> I have 2 columns that mostly contain data common to both. I would
> like to remove this leaving only the data unique to 1 column. Any
> thoughts on how best to accomplish this? Many thanks! peace

It might help to have a better picture of your data. Are they two
columns of data in common with a 3rd column such as

Month 2019 2020
Jan 4 4
Feb 4 5
Mar 8 8

and you just want the "Feb" row? Or do you have two columns of
values

1 2
3 3
4 5
2 1

and you want 5 rows of their unique values:

1
2
3
4
5

Do you need to maintain the source order? Are they some sort of text
files that you could use other *nix tools rather than trying to do it
in Gnumeric?

With a better idea of what your data looks like and what you want the
results to look like, it would help craft a solution.

-tim





_______________________________________________
gnumeric-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gnumeric-list
Reply | Threaded
Open this post in threaded view
|

Re: Comparing 2 Columns

Tim Chase
On 2020-08-19 07:57, bill d wrote:
> They are text. Specifically, 1 column contains names of moth
> species known from my state. The second column has names of moth
> species known from an adjoining state. Most will be present in both
> states, which doesn't interest me. Some will be listed in one state
> or the other, but not both. This is the data i am trying to
> isolate. While i can do this manually, we are talking some 1,900
> and 1,700 names. Further, this will surely come up again in future
> so I'd like a quicker solution. Many thanks for any suggestions!

I usually use command-line tools on *nix for this.  If you have one
file of moth-names in each state:

  $ cat local.txt
  george
  jane
  elroy
  judy

  $ cat adjacent.txt
  david
  judy
  jane
  michael

To find the ones where you only have one instance, you might do

  $ sort local.txt adjacent.txt | uniq -c | awk '$1 == 1{print $2}'
  david
  elroy
  george
  michael

This counts the unique instances of each moth name and then only
prints where the count is 1 ("$1 == 1") meaning the moth-name only
appeared once between the two files.

This falls over if either file can have the same moth more than once.
For that, I'd use a bit of pure awk (which should be available on any
Unix-like, whether Linux, a BSD, or MacOS X)

 $ awk '{++a[$1];if (NR == FNR)++loc[$1]; else ++adj[$1]} END {for (n in a) if (!(n in loc && n in adj)) print n}' local.txt adjacent.txt

That puts all the names in the "a" array, the local ones in the "loc"
array, and the adjacent-state's in the "adj" array, then after
loading them all, goes through all of them to ask which isn't in
both.  If you want to run it regularly, you can put it in a script:

  $ cat > disjoint.awk
  #!/usr/bin/awk -f
  {
    ++a[$1]
    if (NR == FNR)
      ++loc[$1]
    else
      ++adj[$1]
  }
  END {
    for (n in a)
      if (!(n in loc && n in adj))
        print n
  }

then make it executible:

  $ chmod +x disjoint.awk

and run it as needed:

  $ ./disjoint.awk local.txt adjacent.txt

You can tweak that last condition for whatever stats you want
to run, so if you just want local moths that aren't in the adjacent
state

  if (n in loc && ! n in adj)

or just adjacent moths that aren't local

  if (n in adj && ! n in loc)

The nice thing about this is that if you keep your historical data in
different files, you don't have to recreate the logic:

  $ ./disjoint.awk local2019.txt adjacent2019.txt
  $ ./disjoint.awk local2020.txt adjacent2020.txt

Sorry this is a Gnumeric list and I'm providing command-line-utility
solutions, but for doing repeatable tasks with varying input, I find
it easier to script the solution that I can reuse, rather than hoping
I got my expressions right in a spreadsheet. :-)

-tim



_______________________________________________
gnumeric-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gnumeric-list
Reply | Threaded
Open this post in threaded view
|

Re: Comparing 2 Columns

GnuMeric mailing list

It is possible to conditionally format the columns so that items not in the other column are highlighted. You can also use the match or vlookup functions in a 3rd and 4th column to list items only in the first column and only in the second column.


For me conditional formatting was kind of difficult to understand at first but here is what you do:

To conditionally format the first column do the following:

- Select the data in column A (a2 to a14 in the example)

- right click on the selection and select Conditional Formatting...

- under "Condition:" select "Expression x evaluates to TRUE."

- in the box labeled "x:" enter

and(A2<>"",isna(match(A2,B$2:B$14,0)))

- the A2<>"" makes sure blank cells (perhaps at the end of the list) are not highlighted

- now select  the "Background" check box

- click "Edit Style"

- set the "Background Color" to what every you want

- Click "OK"

- Click the "Add" button.

- Click "Close"

Now all the items in column A that are not in column B are highlighted.

Repeat with the second column except change the conditional expression to:

and(B2<>"",isna(match(B2,A$2:A$14,0)))

We can also use another column to display names in column A that aren't in column B on the same row as the original name.

- Select cell C2 and enter:

=if(and(A2<>"",isna(match(A2,B$2:B$14,0))),A2,"")

- copy cell C2

- select cells c3 to c14 and paste the copied cell into these cells

Repeat with cell D2 and the range D3 to D14 but use the expression:

=if(and(B2<>"",isna(match(B2,A$2:A$14,0))),B2,"")


I've attached an example with the conditional formatting and the extra columns.


Regards,

Leon Mitchell





From: [hidden email]
To: [hidden email]
Sent: Monday, August 17, 2020 6:00:03 AM
Subject: gnumeric-list Digest, Vol 190, Issue 2

Send gnumeric-list mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.gnome.org/mailman/listinfo/gnumeric-list
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gnumeric-list digest..."


Today's Topics:

   1. Comparing 2 Columns (bill d)
   2. Re: Comparing 2 Columns (Morten Welinder)
   3. Re: Comparing 2 Columns (Tim Chase)


----------------------------------------------------------------------

Message: 1
Date: Sun, 16 Aug 2020 22:59:44 +0200
From: bill d <[hidden email]>
To: [hidden email]
Subject: Comparing 2 Columns
Message-ID:
        <trinity-8319fc96-bf42-4535-ad5d-28c975aff689-1597611584409@3c-app-mailcom-lxa08>
        
Content-Type: text/plain; charset="us-ascii"

An HTML attachment was scrubbed...
URL: <https://mail.gnome.org/archives/gnumeric-list/attachments/20200816/46eb7214/attachment.html>

------------------------------

Message: 2
Date: Sun, 16 Aug 2020 19:08:22 -0400
From: Morten Welinder <[hidden email]>
To: bill d <[hidden email]>
Cc: Gnumeric Mailing List <[hidden email]>
Subject: Re: Comparing 2 Columns
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="UTF-8"

I don't think there is an easy spreadsheet solution to this, but if
you have the data in sorted text files,
the command line "comm -23 filea fileb" will do the job.

M.


------------------------------

Message: 3
Date: Sun, 16 Aug 2020 18:56:14 -0500
From: Tim Chase <[hidden email]>
To: bill d <[hidden email]>
Cc: [hidden email]
Subject: Re: Comparing 2 Columns
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=US-ASCII

On 2020-08-16 22:59, bill d wrote:
> I have 2 columns that mostly contain data common to both. I would
> like to remove this leaving only the data unique to 1 column. Any
> thoughts on how best to accomplish this? Many thanks! peace

It might help to have a better picture of your data.  Are they two
columns of data in common with a 3rd column such as

  Month  2019  2020
  Jan    4     4
  Feb    4     5
  Mar    8     8

and you just want the "Feb" row?  Or do you have two columns of
values

  1    2
  3    3
  4    5
  2    1

and you want 5 rows of their unique values:

  1
  2
  3
  4
  5

Do you need to maintain the source order?  Are they some sort of text
files that you could use other *nix tools rather than trying to do it
in Gnumeric?

With a better idea of what your data looks like and what you want the
results to look like, it would help craft a solution.

-tim






------------------------------

Subject: Digest Footer

_______________________________________________
gnumeric-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gnumeric-list


------------------------------

End of gnumeric-list Digest, Vol 190, Issue 2
*********************************************

_______________________________________________
gnumeric-list mailing list
[hidden email]
https://mail.gnome.org/mailman/listinfo/gnumeric-list

Book2.gnumeric (3K) Download Attachment