Implement bzip2's run-length encoding

Dennis just a moment. 3 answers, 0 views
code-golf string compression encode

Background

After applying the BWT (as seen in Burrows, Wheeler and Back) and the MTF (as seen in Move to the printable ASCII front), the bzip2 compressor applies a rather unique form of run-length encoding.

Definition

For the purpose of this challenge, we define the transformation BRLE as follows:

Given an input string s that consists solely of ASCII characters with code points between 0x20 and 0x7A, do the following:

  1. Replace each run of equal characters by a single occurrence of the character and store number of repetitions after the first.

  2. Encode the number of repetitions after the first occurrence of the character, using bijective base-2 numeration and the symbols { and }.

    A non-negative integer n is encoded as the string bk…b0 such that n = 2ki(bk)+…+20i(b0), where i({) = 1 and i(}) = 2.

    Note that this representation is always unique. For example, the number 0 is encoded as an empty string.

  3. Insert the string of curly brackets that encodes the number of repetitions after the single occurrence of the corresponding character.

Step-by-step example

Input:  "abbcccddddeeeeeffffffggggggghhhhhhhh"
Step 1: "abcdefgh" with repetitions 0, 1, 2, 3, 4, 5, 6, 7
Step 2: "" "{" "}" "{{" "{}" "}{" "}}" "{{{"
Step 3: "ab{c}d{{e{}f}{g}}h{{{"

Task

Implement an involutive program or function that reads a single string from STDIN or as command-line or function argument and prints or returns either the BRLE or its inverse of the input string.

If the input contains no curly brackets, apply the BRLE. If the input contains curly brackets, apply its inverse.

Examples

INPUT:  CODEGOLF
OUTPUT: CODEGOLF

INPUT:  PROGRAMMING
OUTPUT: PROGRAM{ING

INPUT:  PUZ{LES
OUTPUT: PUZZLES

INPUT:  444488888888GGGGGGGGGGGGGGGGWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
OUTPUT: 4{{8{{{G{{{{W{{{{{

INPUT:  y}}}{{
OUTPUT: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

Additional rules

  • You cannot use any built-ins that compute the BRLE or its inverse of a string.

  • You can use built-ins that:

    • Compute the RLE or RLD of a string, as long as the number of repetitions is not stored in bijective base-2.

    • Perform base conversion of any kind.

  • Your code may print a trailing newline if you choose STDOUT for output.

  • Your code has to work for any input of 1000 or less ASCII characters in the range from 0x20 to 0x7A, plus curly brackets (0x7B and 0x7D).

  • If the input contains curly brackets, you may assume that it is a valid result of applying the BRLE to a string.

  • Standard code golf rules apply. The shortest submission in bytes wins.

3 Answers


jimmy23013 07/13/2015.

CJam, 50 48 bytes

l_{}`:T&1${_T#)_(@a?)+}%{(\2b)*}%@e`{(2b1>Tf=}%?

Thanks for Dennis for saving 2 bytes.

Try it online.

Explanation

l_              e# Read and duplicate input.
{}`:T           e# T = "{}"
&               e# If the input has '{ or '}:
    1$          e# Input.
    {           e# For each character:
        _T#)    e# If it is '{ or '}:
            _(  e# Return 0 for '{ or 1 for '}.
            @a  e# Otherwise, convert the character itself to an array (string).
        ?
        )+      e# If it is a number, increment and append to the previous array.
                e# If it is a string with at least 1 character, do nothing.
    }%
    {(\         e# For each character and bijective base 2 number:
        2b)*    e# Repeat the character 1 + that many times.
    }%
                e# Else:
    @           e# Input.
    e`          e# Run-length encoding.
    {(          e# For each character and length:
        2b1>    e# Convert the length to base 2 and remove the first bit.
        Tf=     e# Map 0 to '{ and 1 to '}.
    }%
?               e# End if.

isaacg 07/15/2015.

Pyth, 48 50 bytes

J`Hs?m*hdi+1xLJtd2tczf-@zTJUz@Jzsm+ed@LJtjhd2rz8

2 bytes thanks to @Maltysen.

Demonstration. Test harness.

Explanation:

J`Hs?m*hdi+1xLJtd2tczf-@zTJUz@Jzsm+ed@LJtjhd2rz8
                                                    Implicit: z = input()
                                                    H is empty dict.
J`H                                                 J = repr(H) = "{}"
   s                                                Print the concatenation of
    ?                        @Jz                    If z and J share any chars:
                     f     Uz                       Filter range(len(z))
                      -@zTJ                         On the absence of z[T] in J.
                   cz                               Chop z at these indices.
                                                    just before each non '{}'.
                  t                                 Remove empty initial piece.
     m*hd                                           Map to d[0] *
         i       2                                  the base 2 number                             
            xLJtd                                   index in J mapped over d[:-1]
          +1                                        with a 1 prepended.
                                             rz8    Otherwise, run len. encode z
                                 m                  map over (len, char)
                                         jhd2       Convert len to binary.
                                        t           Remove leading 1  
                                     @LJ            Map to element of J.
                                  +ed               Prepend char.
                                s                   Concatenate. 

feersum 07/15/2015.

OCaml, 252

t is the function that does the transformation.

#load"str.cma"open Str
let rec(!)=group_beginning and
g=function|1->""|x->(g(x/2)^[|"{";"}"|].(x mod 2))and($)i s=if g i=s then i else(i+1)$s and
t s=global_substitute(regexp"\(.\)\1*\([{}]*\)")(fun s->String.make(1$matched_group 2 s)s.[!1]^g(!2- !1))s

At first I was thinking I had to check for the presence of curly braces, but it turned out to be unnecessary. The decoding clearly has no effect on strings that are already decoded, and the encoding part proved equally harmless to ones that were already encoded.


HighResolutionMusic.com - Download Hi-Res Songs

1 Alan Walker

Diamond Heart flac

Alan Walker. 2018. Writer: Alan Walker;Sophia Somajo;Mood Melodies;James Njie;Thomas Troelsen;Kristoffer Haugan;Edvard Normann;Anders Froen;Gunnar Greve;Yann Bargain;Victor Verpillat;Fredrik Borch Olsen.
2 Sia

I'm Still Here flac

Sia. 2018. Writer: Sia.
3 Cardi B

Taki Taki flac

Cardi B. 2018. Writer: Bava;Juan Vasquez;Vicente Saavedra;Jordan Thorpe;DJ Snake;Ozuna;Cardi B;Selena Gomez.
4 Little Mix

Woman Like Me flac

Little Mix. 2018. Writer: Nicki Minaj;Steve Mac;Ed Sheeran;Jess Glynne.
5 Halsey

Without Me flac

Halsey. 2018. Writer: Halsey;Delacey;Louis Bell;Amy Allen;Justin Timberlake;Timbaland;Scott Storch.
6 Lady Gaga

I'll Never Love Again flac

Lady Gaga. 2018. Writer: Benjamin Rice;Lady Gaga.
7 Bradley Cooper

Shallow flac

Bradley Cooper. 2018. Writer: Andrew Wyatt;Anthony Rossomando;Mark Ronson;Lady Gaga.
8 Bradley Cooper

Always Remember Us This Way flac

Bradley Cooper. 2018. Writer: Lady Gaga;Dave Cobb.
9 Kelsea Ballerini

This Feeling flac

Kelsea Ballerini. 2018. Writer: Andrew Taggart;Alex Pall;Emily Warren.
10 Mako

Rise flac

Mako. 2018. Writer: Riot Music Team;Mako;Justin Tranter.
11 Dewain Whitmore

Burn Out flac

Dewain Whitmore. 2018. Writer: Dewain Whitmore;Ilsey Juber;Emilio Behr;Martijn Garritsen.
12 Avril Lavigne

Head Above Water flac

Avril Lavigne. 2018. Writer: Stephan Moccio;Travis Clark;Avril Lavigne.
13 Khalid

Better flac

Khalid. 2018. Writer: Charlie Handsome;Jamil Chammas;Denis Kosiak;Tor Erik Hermansen;Mikkel Stoleer Eriksen;Khalid.
14 Lady Gaga

Look What I Found flac

Lady Gaga. 2018. Writer: DJ White Shadow;Nick Monson;Mark Nilan Jr;Lady Gaga.
15 Deep Chills

Run Free flac

Deep Chills. 2018.
16 Dynoro

In My Mind flac

Dynoro. 2018. Writer: Georgi Kay;Feenixpawl;Ivan Gough.
17 Charli XCX

1999 flac

Charli XCX. 2018. Writer: Charli XCX;Troye Sivan;Leland;Oscar Holter;Noonie Bao.
18 NCT 127

Regular (English Version) flac

NCT 127. 2018.
19 Lukas Graham

Love Someone flac

Lukas Graham. 2018. Writer: Don Stefano;Morten "Rissi" Ristorp;Morten "Pilo" Pilegaard;Jaramye Daniels;James Alan;David LaBrel;Lukas Forchhammer Graham.
20 Rita Ora

Let You Love Me flac

Rita Ora. 2018. Writer: Rita Ora.

Language

Popular Tags