Decode Base64 To Multiple Files With Bash & Perl
Hey guys, ever found yourself staring at a massive HAR file, knowing it’s got all your video .ts files tucked away inside, encoded in Base64? Yeah, it’s a common pickle when you’re trying to grab specific video segments or analyze network traffic. But don’t sweat it! Today, we're diving deep into how you can decode that Base64 blob and save those .ts files as separate entities, using the power of Bash and a dash of Perl. We'll break down exactly what's happening, why it’s useful, and give you the step-by-step commands you need to get this done. So, buckle up, grab your favorite beverage, and let’s get these video files out of captivity!
Understanding the Problem: Base64 Encoded Files in HAR
So, what’s the deal with Base64 encoding in HAR files, and why are we dealing with multiple .ts files? Alright, picture this: when web browsers or tools like Chrome DevTools record network activity (that’s your HAR file), they often capture the content of responses. Sometimes, especially with media streaming like HLS (HTTP Live Streaming), the video data is broken down into small chunks, often with a .ts (transport stream) extension. To make sure this data can be safely transmitted and stored within the JSON structure of a HAR file, it gets Base64 encoded. This process converts binary data (like video chunks) into a string of ASCII characters. It’s like putting your video files in a super-secure, text-only envelope so they can travel through the internet without getting scrambled. However, once you have that envelope, you need to open it up to get your videos back, right? And if the original file contained many of these video chunks, the HAR file will contain many Base64 encoded strings, each representing a part of your video. Our mission, should we choose to accept it, is to extract each of these encoded strings, decode them, and save each one as its own .ts file. This is super handy for reassembling a full video, troubleshooting streaming issues, or just for data recovery. We're not just talking about one file here; we're talking about batch processing multiple encoded segments, which is where things can get a bit tricky but also really rewarding once you nail it. This is where scripting comes in, guys, and it’s going to make your life a whole lot easier than trying to manually copy-paste and decode each chunk!
The Tool Belt: Bash and Perl to the Rescue
Now, you might be thinking, "Can’t I just use an online Base64 decoder?" Well, for one file, maybe. But for potentially hundreds or thousands of .ts file segments hidden within a HAR file? Absolutely not! That's where the real magic of scripting comes in, and Bash and Perl are your trusty sidekicks for this task. Bash, the Bourne Again SHell, is the default command-line interpreter on most Linux and macOS systems. It’s fantastic for file manipulation, running commands, and orchestrating other programs. Think of it as the conductor of an orchestra, telling all the other instruments (programs) what to do and when. Perl, on the other hand, is a powerful scripting language known for its text-processing capabilities. It’s like a master craftsman with a swiss army knife for strings and data. For our mission, Bash will be great for handling the file system operations – like finding the Base64 strings within the HAR file, creating directories, and naming our output files. Perl will shine when it comes to the actual Base64 decoding and potentially more complex parsing if needed. We're going to combine their strengths: Bash will help us isolate the Base64 data, and Perl will do the heavy lifting of decoding it back into its original binary .ts file format. This duo is incredibly efficient and flexible, allowing us to automate a process that would be mind-numbingly tedious otherwise. Plus, using these tools means you’re learning valuable skills that extend far beyond just decoding files; you’re building a foundation for automating all sorts of digital tasks. So, let’s get our hands dirty and see how these two powerhouses work together!
Step 1: Isolating the Base64 Data with Bash
Alright, the first big hurdle is getting those Base64 encoded strings out of the HAR file. Remember, a HAR file is essentially a JSON document, and our video chunks are likely embedded as strings within it. We need a way to pinpoint these specific strings. Bash is going to be our primary tool for this initial extraction. We'll use a combination of grep and potentially sed or awk to find the relevant lines or sections containing the Base64 data. The key is to identify a pattern that reliably surrounds your video .ts data. Often, it might be associated with a content.mimeType like video/mp2t or a specific key name. Let's assume your Base64 data is in a field named content.text. A common approach is to use grep to find lines containing this key and then extract the value. For example, you might use something like:
grep '"content":"' harfile.har | grep '"video/mp2t"' | sed 's/.*"content":"//' | sed 's/",//'
This is a simplified example, and the exact command will depend heavily on the structure of your specific HAR file. You might need to inspect the HAR file manually first using a text editor or a JSON viewer to understand how the Base64 data is formatted. Look for keys like content, encodedData, or text that hold long strings of characters which look like Base64 (a mix of uppercase and lowercase letters, numbers, +, /, and sometimes = for padding). Once you've identified the pattern, you can refine your grep and sed commands. You might need to extract multiple lines or use more advanced JSON parsing tools if the Base64 string spans across multiple lines or is nested deeply. The goal here is to end up with a stream of raw Base64 data, ideally one encoded string per line, ready for the next stage. We might also want to pipe this output into a temporary file for easier handling, or directly pipe it into our Perl script. Ensuring that we are only grabbing the Base64 content and not any surrounding JSON syntax is crucial. This initial isolation step is all about precision – getting only the data you need, without any of the surrounding noise. It’s the foundation upon which the rest of our decoding process will be built, so getting it right here saves a lot of headaches later on, trust me!
Step 2: Decoding with Perl and Saving Files
Now that we have our Base64 data isolated (ideally, one encoded string per line), it’s time to bring in Perl to perform the actual decoding. Perl has a built-in module called MIME::Base64 which makes this incredibly straightforward. We'll write a short Perl script that reads each line of Base64 data from standard input (which we'll pipe from our Bash command), decodes it, and then writes the resulting binary data to a file. We also need a way to name these output files uniquely. A simple counter will work perfectly for this. Here’s a sample Perl script:
#!/usr/bin/perl
use strict;
use warnings;
use MIME::Base64;
my $count = 1;
while (my $line = <STDIN>) {
chomp $line;
# Skip empty lines
next unless $line;
my $decoded_data = decode_base64($line);
my $filename = sprintf("video_%04d.ts", $count);
open(my $fh, ">", $filename) or die "Could not open file '$filename' $!";
print $fh $decoded_data;
close $fh;
print "Saved: $filename\n";
$count++;
}
print "\nDecoding complete!\n";
To use this script, you would save it as something like decode_ts.pl. Then, you'd combine it with your Bash extraction command. If your Bash command outputs the Base64 strings to standard output, you can pipe it directly:
# Assuming your bash command is here, outputting base64 strings
grep '"content":"' harfile.har | grep '"video/mp2t"' | sed 's/.*"content":"//' | sed 's/"",//' | perl decode_ts.pl
This command sequence first extracts the Base64 strings using Bash commands, and then pipes that output directly into the decode_ts.pl script. The Perl script reads each Base64 string, decodes it using decode_base64, and saves it as video_0001.ts, video_0002.ts, and so on. The sprintf("video_%04d.ts", $count) part is crucial for creating sequentially numbered filenames with leading zeros, making them easy to manage. The use strict; and use warnings; are good Perl practices that help catch potential errors. We also include chomp $line; to remove any trailing newline characters from the input line before decoding, and next unless $line; to make sure we don't try to decode empty lines. The open(my $fh, ">", $filename) part creates or overwrites a file with the generated name, and print $fh $decoded_data; writes the actual binary video data. This combination is super powerful because it automates the entire process from raw HAR file to individual .ts files. You just run the command, and boom, you've got your video chunks ready to go!
Putting It All Together: The Full Command
Let's consolidate everything into a single, powerful command pipeline that you can run directly in your terminal. This is where Bash and Perl truly shine together, automating a complex task into something remarkably simple to execute. Remember, the exact grep and sed commands might need tweaking based on your specific HAR file structure. Always inspect your HAR file first to understand how the Base64 encoded video data is represented. But assuming a common structure where the Base64 content is directly associated with a video/mp2t MIME type, here’s how you can combine the steps:
First, let's create a directory to store our extracted video files. This keeps things organized and prevents cluttering your current directory.
mkdir extracted_ts_files
cd extracted_ts_files
Now, the core command pipeline. We'll redirect the output of the Perl script to a file named decode_log.txt so you can see which files were created, while the video files themselves will be saved in the current directory (extracted_ts_files):
grep -oP '(?<="content":")[^"\n]+(?="|$)' ../harfile.har | grep -v '"video/mp2t"' -B 1 | grep '"video/mp2t"' -A 1 | grep -oP '(?<="content":")[^"\n]+(?="|$)' | perl -M