ULS is an object factory for generic-purpose lexical analysis supporting UTF-8. It's provided as C/C++ library with a couple of other instruments for the Windows and Linux platforms.
ULS was specially built to be an intuitive, practical, flexible and optimized tokenizer that you can use.
Here are some key features of "ULS":
· ULS can instantiate multiple objects for lexical analyses.
· Each objects can process multiple inputs of different languages.
· The language for lexical analysis is specified by configuration file suffixed by *.ulc.
· ULS can tokenize the input file which encoded by UTF-8.
· The input file may contain the words in localized language as identifier.
· ULS can stream the tokens from many input files.
· The stream can be stored in a uls-file(*.uls) and replayed from it.