Paper Summary: DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

Paper Summary: DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones



DECKARD: Scalable and Accurate Tree-based Detection of Code Clones∗

  1. Focus / Problem to be solved
    1. Existing approaches either do not scale to large code bases or are not robust against minor code modi- fications.
  2. Importance
    1. eliminate duplicated code in large code base
  3. Method
    1. generate an Abstract Syntax Tree (AST) or Parse Tree (too expensive for large programs)
    2. Generate characteristic Vector for tree and subtrees
    3. Compare the different mere characteristic features to detect clones using various scalable techniques
  4. Context
    1. Tree similarity detection
    2. Studies on Simple Code Clones (evolution of software)
    3. Simple Clone detection
      1. CPMiner, CCFinder (token based, string based)
    4. High-level Structural Clone detection using data mining techniques
      1. frequent item set
    5. Semantics based clone detection (not very scalable)
      1. use Program Dependence Graphs
  5. Results
    1. scalable performance on large code base
    2. detect more clones with lower similarity score
  6. Unique contributions
    1. A new similarity definition using abstract syntax trees
    2. A scalable tree-based similarity calculation algorithm using characteristic vectors
  7. Possible applications
    1. better clone detection tools
This entry was posted in Programming Language. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s