Home > awk, linux, UNIX > Print lines between two patterns , the awk way …

Print lines between two patterns , the awk way …


Note: My awk guide.

Example input file:

test -3
test -2
test -1
OUTPUT
top 2
bottom 1
left 0
right 0
page 66
END
test 1
test 2
test 3

The standard way ..

awk '/OUTPUT/ {flag=1;next} /END/{flag=0} flag {print}' inputFile

Output:

top 2
bottom 1
left 0
right 0
page 66

Self-explained indented code:

awk '
/OUTPUT/ {flag=1;next}        # Initial pattern found --> turn on the flag and read the next line
/END/    {flag=0}             # Final pattern found   --> turn off rhe flag
flag     {print}              # Flag on --> print the current line
' inputFile

The first optimization is to get rid of the print , in awk when a condition is true print is the default action , so when the flag is true the line is going to be echoed.

To delete de NEXT statement , in order o prevent printing the TAG line,  we need to activate the flag after the “OUTPUT” pattern discovery and after the flag evaluation.

A slight variation of the program flow and we’re done:

awk '/END/{flag=0}flag;/OUTPUT/{flag=1}' inputFile

PD: What if we only want to print the lines enclosed between the OUTPUT && END tags ? check this

Categories: awk, linux, UNIX Tags: , , , ,
  1. Anonymous coward
    05/10/2012 at 00:56

    sed -n ‘/OUTPUT/,/END\ related/p’

    that should the same job, only more elegantly.

  2. 11/10/2012 at 19:02

    Well , not exactly the same:

    # sed -n  '/OUTPUT/,/END/p' infile
    OUTPUT
    top 2
    bottom 1
    left 0
    right 0
    page 66
    END
    

    To exclude the tags you must go a little further:

    # sed -n '1,/OUTPUT/!{ /END/,/OUTPUT/!p; }' infile 
    top 2
    bottom 1
    left 0
    right 0
    page 66
    

    See: http://sed.sourceforge.net/sedfaq4.html#s4.24

    We will have the same inconvenients .

    I still prefer awk for excluding the tags.

    Cheers and thanks for the feedback.

  3. Sarathi
    17/10/2012 at 14:52

    Is there any way to eliminate repetition. For Eg. if i am having a string like below and i need only the first pattern of string between OUTPUT to END

    OUTPUT
    top 2
    bottom 1
    left 0
    END
    right 0
    OUTPUT
    page 66
    test 1
    test 2
    test 3
    END

  4. 17/10/2012 at 19:44

    Of course, just use the exit statement when finding the first “END” tag.

    $ awk '/END/{exit}flag;/OUTPUT/{flag=1}' inputFile
    top 2
    bottom 1
    left 0
    • Sarathi
      18/10/2012 at 10:20

      Thank you so much klashxx that helped………

  5. 18/10/2012 at 10:23

    Then, how can we get the lines that are in between the next same two strings?

    • 18/10/2012 at 21:37

      Please describe what you are trying to accomplish with a clear example

  6. Chaitanya vemuru
    20/12/2012 at 15:27

    is there any way to print the lines between the patterns after some pattern is matched
    eg:
    xyz
    abc
    asd
    sdf
    fghj
    kje
    dnsk

    i need a script which will have to search in the file whether it has ” xyz” and “abc” are in contiguous lines and if it has then needs to print the text between “asd” & “dnsk”.

    pls respond to this ques asap….

    • 20/12/2012 at 23:43

      This way:

      $ cat file
      xyz
      abc
      asd
      sdf
      fghj
      kje
      dnsk
      $  awk '/dnsk/{exit}flag;/xyz/{c=NR}/abc/&&NR==(c+1){flag=1}'  file
      asd
      sdf
      fghj
      kje
  7. Christof R
    15/02/2013 at 14:24

    Hi. Thanks for this blog entry. I stumbled across on my way to search for an extension of multimarkdown. I would like to define blocks:

    text
    text
    text
    
    ~~~~ ID
    Lorem ipsum dolor 
    sit amet, consectetur 
    adipiscing elit.
    ~~~~
    
    text
    text
    text
    

    and transform it to latex like:

    text
    text
    text
    
    \begin{ID}
    Lorem ipsum dolor 
    sit amet, consectetur 
    adipiscing elit.
    \end{ID}
    
    text
    text
    text
    

    I thought what you showed here would bring me there but I couldn’t figure out how. Any hints?

    BR

    • 18/02/2013 at 23:41

      Hi Christof, for your problem i would use a perl one-liner to perform an inplace replacement:

      #cat inputFile 
      text
      text
      text
      
      ~~~~ ID
      Lorem ipsum dolor 
      sit amet, consectetur 
      adipiscing elit.
      ~~~~
      
      text
      text
      text
      #perl -pi -e 's/^~{4}\s+ID/\\begin{ID}/g;s/^~{4}\s*(?!ID)/\\end{ID}/g' inputFile
      #cat inputFile 
      text
      text
      text
      
      \begin{ID}
      Lorem ipsum dolor 
      sit amet, consectetur 
      adipiscing elit.
      \end{ID}
      
      text
      text
      • Christof R
        19/02/2013 at 18:17

        Hi and thanks for your answer. This is really concise, however, I did not clarify that ID is a variable string. So I have to read it somehow at the start of the block and use it again at the end. That got me down…

      • 21/02/2013 at 08:51

        Ok , i got you , there’re many ways to solve your problem , give this a try;

        awk '/^~~~~/{if ($2!=""){s=$2};$0= $2!="" ? "\\begin{"s"}" : "\\end{"s"}"}1' inputFile
      • DI Christof Rath
        22/02/2013 at 21:53

        Perfekt. Thank you again.

      • bimleshsharma
        08/09/2013 at 16:54

        Actually i wanted this with one condition:
        log file:
        asd
        START
        as
        erg
        ege
        4t
        END
        lgjlkej
        nelgkl
        START
        lrkglk
        egiorgklk
        gljegj
        google
        lwekglk
        END

        So i need the lines between START and END having ‘google’ under that. Please help.

      • 09/09/2013 at 08:52

        This solution use an array ,it rewinds the index if the “google” pattern is not present between the tags , so having this text file:

        asklasja
        asas
        START
        google
        saas
        END
        asd
        START
        as
        asassa
        da
        erg
        ege
        4t
        END
        lgjlkej
        nelgkl
        START
        lrkglk
        egiorgklk
        gljegj
        google
        lwekglk
        END
        assa
        asas

        The code will be:

        awk '/END/   {flag=0;if(x){L=j}else{j=L};x=0}
             /google/{x=1}
             flag    {a[++j]=$0;next}
             /START/ {flag=1}
             END     {for (i=1;i<=j;i++){print a[i]}}' infile

        And the result:

        google
        saas
        lrkglk
        egiorgklk
        gljegj
        google
        lwekglk
  8. 17/09/2013 at 14:15

    Reblogged this on justanotherhumanoid and commented:
    Beautiful display of AWK craftsmanship. Dont miss the solutions in the comments section.

    • Cithosi
      12/05/2014 at 10:49

      Hi,
      with similar senario, I need the output include the START and END string, output will be like

      START
      google
      saas
      END
      START
      lrkglk
      egiorgklk
      gljegj
      google
      lwekglk
      END

      Please help
      Thanks
      Cithosi

      • 12/05/2014 at 11:42

        Hello ,just change the matching order in awk in order to set a positive flag before the “printing”:

        awk '/START/{flag=1}flag;/END/{flag=0}' infile

        Or use this concise sed:

        sed -n  '/START/,/END/p' infile

        Is up to you , but I suppose sed performance will be slightly better for large files.

      • Cithosi
        12/05/2014 at 12:27

        Thanks for the quick reply, I need to print only if “google” exist within the search block please,my file have over 3000 lines of text.

        Thanks

      • 12/05/2014 at 13:38

        Ok, having this sample file:

        asas
        START
        weqeq
        eqwe
        eqwe
        google
        END
        eer
        START
        ccc
        ccc
        ccc
        END
        assa
        sas
        START
        zzzz
        google
        END
        lll

        You can apply this gawk (most Linux , if not you will need to delete the elements of the array one-by-one)

        
        gawk '/START/ {flag=1;found=j=0;delete a}  # Beginning pattern -> inicialization 
              flag    {a[++j]=$0}                  # If flag store line in array
              /google/ && flag{found=1}            # If google & flag set found as true
              /END/   {flag=0                      # Ending pattern & found show our array   
                       if(found){for (i=1;i<=j;i++){
                                print a[i]}}}' infile
        

        The result wiil be:

        START
        weqeq
        eqwe
        eqwe
        google
        END
        START
        zzzz
        google
        END
        

        Machinery is simple: store the text between tags but show only if pattern is found.

      • Cithosi
        12/05/2014 at 14:49

        It worked, much appriciated,

        Thanks

  9. Rahul
    23/12/2013 at 19:55

    Hi Klashxx, thanks for this incredibly helpful post. I have one question though. I want to input the Begin tag as a command line argument. This is what I tried –
    awk ‘BEGIN{a=’$1′}/END/{exit}flag;/a/{flag=1}’ text.txt
    but doesn’t work. Whats the solution?

    • 24/12/2013 at 08:33

      Hi Rahul, you can go the standard path, using the -v flag:

      # awk -v pat="OUTPUT" '/END/{exit}flag;match($0,"^"pat){flag=1}' text.txt

      Or the tricky way:

      # pat="OUTPUT"
      # awk '/END/{exit}flag;/'${pat}'/{flag=1}' text.txt

      See Passing values to awk , the trick.

  10. vibin
    02/02/2016 at 08:11

    HI Klash

    Need help in getting one awk statement , awk in between two patterns and print the line if there is only one line between the pattern

    I want to print abcd , that is one line in between

    ex

    ———–
    abcd
    ———–
    efgh
    ijklm
    ———–

    • 10/02/2016 at 12:50

      Having:

      $ cat ex
      ---
      abcd
      ---
      efgh
      ijklm
      ---
      xxgfgh
      ---

      This code will do the trick:

      $ awk '/^---/{if(i==1){print a[i]};i=0;next}{a[++i]=$0}' ex
      abcd
      xxgfgh

  11. 30/06/2016 at 06:37

    Hi,

    I want to print the lines between the markers patterns, but along with that i also want to print the lines matching other patterns as well.
    For example: If my input file is—
    START
    as
    erg
    ege
    END
    abc
    xyz
    pqr
    xyz
    asd
    NOW
    lrkglk
    egiorgklk
    gljegj
    google
    NOT

    I want to print lines between START and END,NOW and NOT, and the lines matching xyz. i.e. my output should be:
    START
    as
    erg
    ege
    END
    xyz
    xyz
    NOW
    lrkglk
    egiorgklk
    gljegj
    google
    NOT

    Please help me. It’s very urgent!!!

    • 30/06/2016 at 07:47

      awk ‘/START|NOW/{flag=1}flag||/xyz/;/END|NOT/{flag=0}’ file

  1. 03/10/2011 at 19:20
  2. 23/12/2011 at 16:17

Leave a comment