Jawk 6.4.00
-
Home
- Jawk Java API
Jawk in Java
For most Java applications, start with Awk[1]. It gives you:
- short convenience methods for string-in, string-out use cases
- compiled
AwkProgramandAwkExpressionartifacts for reuse - direct access to
AVM[2] when you want one reusable runtime
Start with Awk
Create an Awk instance directly for normal use:
Awk awk = new Awk();
Construct it with AwkSettings when you need engine defaults such as field separators, locale, or record separators:
AwkSettings settings = new AwkSettings();
settings.setFieldSeparator(",");
Awk awk = new Awk(settings);
AwkSettings Reference
| Setter | Default | Description |
|---|---|---|
setFieldSeparator(String) |
null (default AWK FS) |
The initial value of FS, the field separator |
setLocale(Locale) |
Locale.US |
Locale for numeric output formatting |
setDefaultRS(String) |
Platform line separator | Default value for RS, the record separator |
setUseSortedArrayKeys(boolean) |
false |
Whether to keep associative array keys in sorted order |
setAllowArraysOfArrays(boolean) |
true |
Whether the compiler accepts gawk-style nested array features such as a[i][j] and split(..., a[i]) |
putVariable(String, Object) |
Empty map | Pre-set variables available before BEGIN |
Output destination is specified per-call on the builder (execute(), execute(PrintStream), execute(OutputStream), execute(Appendable), or execute(AwkSink)). See the Custom Output[3] guide for details.
For more on passing variables to scripts, see Variables and Arguments[4].
By default, Jawk accepts both classic multi-dimensional array syntax (a[i, j]) and gawk-style arrays of arrays (a[i][j]). Disable this compile-time mode when you need strict classic AWK parsing; doing so also rejects subarray operands in array-only positions such as split(..., a[i]), for (k in a[i]), and "x" in a[i]:
AwkSettings settings = new AwkSettings();
settings.setAllowArraysOfArrays(false);
Awk awk = new Awk(settings);
Construct it with extension instances when you want those functions available to the script:
Awk awk = new Awk(StdinExtension.INSTANCE, new MyExtension());
The dedicated Writing Extensions[5] guide covers how to write your own extensions to expose new functions, written in Java, to your AWK scripts.
The Shortest Path: script().execute()
script().execute() is the smallest API surface for full AWK programs when you want the printed output back as a Java String:
Awk awk = new Awk();
String result = awk.script("{ print toupper($0) }").input("hello world").execute();
// result = "HELLO WORLD\n"
Use this when:
- you already have the script and input in memory
- you want the rendered AWK output as a
String - you do not need explicit
ARGV, per-execution variables, or runtime reuse
Compiled Programs
When the same script will be reused, compile it once and run the compiled program:
Awk awk = new Awk();
AwkProgram program = awk.compile("{ print prefix $1 }");
awk.script(program)
.input("alpha beta\n")
.execute();
Output Destination
Output is specified per-call on the builder:
execute()returns the printed output as aStringexecute(PrintStream)sends output to aPrintStreamsuch asSystem.outexecute(OutputStream)sends output to anyOutputStreamexecute(Appendable)captures text into aStringBuilderorAppendableexecute(AwkSink)uses a fully custom output strategy
Custom Output with AwkSink
Use AwkSink[6] when plain text is not the right abstraction. An AwkSink receives raw print(...) and printf(...) calls together with the current AWK formatting state, so your host application can collect structured AWK output instead of rendered text.
Awk awk = new Awk();
CollectingSink sink = new CollectingSink();
awk.script("{ print $1, $2 }")
.input("alpha beta\ngamma delta\n")
.execute(sink);
See the Custom Output[3] guide for the full AwkSink contract, built-in implementations, and detailed examples.
Reusable Runtime: AVM
When you want to keep the same runtime alive across several calls, create an AVM:
Awk awk = new Awk();
AwkProgram program = awk.compile("BEGIN { print \"value\" }");
try (AVM avm = awk.createAvm()) {
avm.setAwkSink(mySink);
avm.execute(program, myInputSource, Collections.<String>emptyList(), null);
avm.execute(program, myOtherInputSource);
}
AVM is sequential-only and intentionally stateful. Use it when performance matters and you want one reusable runtime for repeated program runs or repeated expression evaluation.
Which API Should I Use?
script(text).input(text).execute()for the shortest string-in, string-out path.compile(...)plusscript(compiled).execute(out)when a whole AWK program is reused.compileExpression(...)pluseval(...)when one expression is reused.createAvm()when you want one reusable runtime across several calls.
Complete Example
The example below reads CSV input, sums the second column per category in the first column, and captures the result:
import io.jawk.Awk;
import io.jawk.util.AwkSettings;
public class JawkDemo {
public static void main(String[] args) throws Exception {
// Configure the engine for CSV input
AwkSettings settings = new AwkSettings();
settings.setFieldSeparator(",");
Awk awk = new Awk(settings);
// AWK script: accumulate totals by category, print sorted results
String script = "{ totals[$1] += $2 } END { for (k in totals) print k, totals[k] }";
// Input data
String csv = "fruit,10\nvegetable,20\nfruit,15\nvegetable,5\n";
// Execute and capture the printed output
String result = awk.script(script).input(csv).execute();
System.out.println(result);
}
}
See Also
Next Steps
- [1] apidocs/io/jawk/Awk.html
- [2] apidocs/io/jawk/backend/AVM.html
- [3] java-output.html
- [4] java-variables.html
- [5] extensions-writing.html
- [6] apidocs/io/jawk/jrt/AwkSink.html
- [7] java-input.html
- [8] java-compile.html
- [9] java-advanced.html
- [10] extensions.html
