Javatools
The Javatools are a collection of Java classes for a variety of small tasks, such as parsing, database interaction or file handling. They were developed by Fabian M. Suchanek for the YAGO-NAGA project. The Javatools are licensed under a Creative Commons Attribution 3.0 License by the YAGO-NAGA team.
People
Downloads
The tools require Java version 1.6+ (download here). You can
- Download the Java tools (version: 20-03-2014)
- Browse the full documention
- Browse the short descriptions of the classes below
Tools
Parsing
Char | Decodes, encodes and normalizes UNICODE, UTF8, HTML and URI/URL strings |
DateParser | Parses and normalizes different date formats (e.g. "January 5th, 2000" or "500 BC") |
Name | Provides primitive heuristics to recognize and parse person names and organization names |
NounGroup | Splits a noun group (given by a String) into its modifiers and its head |
NumberParser | Parses and normalizes complex number expressions (e.g. "10 million meters") |
NumberFormatter | A simple number formatter |
PlingStemmer | Stems an English noun to singular. Knows nearly all exceptions. |
RegularExpression | Parses a Regular Expression and converts it to an automaton, allows to invert it |
Database
Database | This abstract class provides a simple Wrapper for an SQL data base, including a bulk inserter. |
MySQLDatabase | implements the Database-interface for a MySQL data base. |
OracleDatabase | implements the Database-interface for an Oracle SQL data base |
PostgresDatabase | implements the Database-interface for a Postgres SQL data base |
ResultIterator | An Iterator across an SQL database ResultSet |
SQLType | Implements SQL datatypes in a database-specific way |
WordNet | provides a (non-database) wrapping for WordNet |
DBWordNet | provides a database wrapping for WordNet |
Datatypes and Iterators
ArrayQueue | Implements a simple non-blocking queue |
BitVector | Implements a bit vector, i.e. a list of bits, a set of small integers |
CombinedIterator | Combines multiple iterators to one iterator |
CompressedString | Compresses a String in a potentially lossy way to 7, 6 or less bits per character (instead of 16) |
DirectedGraph | Implements a directed graph wit ancestor finding |
FilteredIterator | Implements an iterator that allows filtering out certain elements |
FinalMap | Provides a nicer constructor for a TreeMap |
FinalSet | Provides a very simple container implementation with zero overhead |
Frequency Vector | Provides recall and precision measures on bags of words, including fuzzy recall and fuzzy precision, and Wilson Interval computation |
Immutable | Wraps a list or a set so that it becomes immutable. |
IntSet | implements a set of small integers as a bit vector with constant time access |
IterableForIterator | Wraps an iterator so that it can be used in a for-each-loop |
IterableForEnumeration | Wraps an untyped enumeration into a typed iterator |
MappedIterator | implements an iterator that maps each element by a function before yielding it |
Pair | For the simple datatype Pair |
PeekIterator | An Iterator that can look ahead (peek) one element |
SmallStack | Implements a fast stack for int, long, double |
SparseVector | Represents a Sparse Vector, i.e. a vector that has only few non-zero entries. Implements k-means |
SmallStack | Represents an efficient stack for primitive datatypes (int, long, double, boolean) |
SVMModel | Implements an SVM-light-Model |
Tree | For the simple datatype Tree |
Trie | implements a trie (an efficient set of strings based on prefixes) |
UndirectedGraph | implements an undirected graph |
Visitor | For the common visitor design pattern |
Visitable | For the common visitor design pattern |
Administrative
Announce | Provides an easy log writer with timed progress bars |
D | Provides convenience methods for Input/Output. Allows to do basic I/O with easy procedure calls -- nearly like in normal programming languages. Furthermore, the class provides basic set operations for EnumSets. |
CallStack | Allows to retrieve the method name and the source code line number of the current code position at runtime |
Parameters | Provides an interface for an initialization/configuration/properties-File |
Tracer | Provides an tracer that can figure out where a program hangs if a method runs for longer than a given time period |
FileHandling
CSVFile | Writing to a comma-separated file (CSV file). |
CSVLines | Can iterate through the columns of a comma-separated file (CSV file). |
DeepFileSet | Represents a set of files as given by a wildcard string. Can recurse subfolders. |
FigureProducer | Produces Latex tables and JPG plots for table data |
FileLines | Provides an iterator over the lines in a file |
FileSet | Represents a set of files as given by a wildcard string. Does not include folders, is not case-sensitive. |
HTMLReader | Reads characters from an HTML-file |
MatchReader | Provides an iterator over Regular Expression matches in a file |
SimpleInputStreamReader | Reads characters from a file, regardless of the encoding |
SimpleOutputStreamWriter | Writes character to a file, regardless of the encoding (see here for a problem description) |
UTF8Reader | Reads characters from an UTF8-encoded file |
UTF8Writer | Writes UTF8-encoded characters to a file |