Softpedia
 

WINDOWS CATEGORIES:



GLOBAL PAGES >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Bitdefender Total ...
  • Ocster Backup Pro ...
  • Hard Disk Sentinel...
  • FlashFXP [DISCOUNT...
  • DVDFab DVD Copy [D...
  • Kaspersky Internet...
  • Avast! Internet Se...
  • Avira Internet Sec...
  • Webroot SecureAnyw...
  • McAfee Total Prote...
  • Home > Windows > Programming > Other Programming Files
     Report malware

    Duke 1.0

    download button

    No screenshots available
    Downloads: 559  Tell us about an update
    User Rating:
    Rated by:
    NOT RATED
    0 user(s)
    Developer:

    License / Price:

    Size / OS:

    Last Updated:

    Category:

    Apache License 2.0 / $0
    3.9 MB / Windows All
    [view history]
    C: \ Programming \ Other Programming Files

     Read user reviews (0)  Send to friend   Follow (0 users)

    Duke description

    A fast deduplication engine

    Duke will provide users with a fast and flexible deduplication (entity resolution or record linkage) engine written in Java on top of Lucene. At the moment it can process 1,000,000 records in 11 minutes on a standard laptop in a single thread.

    Duke can be used to find duplicate records inside a single table/data source, or it can be used to find records in different tables/sources which most likely represent the same real-world entity.

    NOTE: Duke also runs on Mac and Linux platforms.

    Requirements:

    · Java

    What's New in This Release: [ read full changelog ]

    Performance improvements:
    · Support for multi-threading added
    · Using NIOFSDirectory on all platforms except Windows
    · New in-memory backend, faster than Lucene (experimental)

    Changes to Comparators:
    · Geo-coordinate comparator added.
    · Q-grams comparator added.
    · Levenshtein implementation is now faster
    · Weighted Levenshtein weight estimator now knows position in string ( issue 81 )

    Changes to Cleaners:
    · Added PhoneNumberCleaner
    · Extended and generalized regexp cleaner
    · Removed sub-cleaner concept, added support for multiple cleaners

    Other improvements:
    · Implemented user control over lookup props
    · Upgraded to Lucene 4.0
    · Added MatchListener.startProcessing() callback
    · Removed some MatchListener callback methods (weren't thread-safe)
    · InMemoryLinkDatabase now complete and tested
    · LinkDatabaseMatchListener bug fixes
    · Better validation of configurations
    · JDBCEquivalenceClassDatabase added
    · RDBMSLinkDatabase performance improvement

    Changes to command-line client:
    · Added data debug mode
    · Fixed bug with reu...

     


    TAGS:

    deduplication engine | entity resolution | record linkage | deduplication | engine | resolution

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM