Paper Summary: Using Web Corpus Statistics for Program Analysis

Paper Summary: Using Web Corpus Statistics for Program Analysis

This paper utilizes control flow, data dependence and program dependence graph to apply n-gram models for finding significant code plagiarism. It is the first paper to apply program structure information extensively in applying statistical modeling. (The paper essentially work with frequency of subgraphs).

  1. Problem/ Focus
    1. Code Plagiarism / Copy past bugs
  2. Importance
  3. Context
  4. Approach
    1. Mapping N-gram models
      1. gram
        1. line of code (too general)
        2. each token (too small)
        3. Canonical Form
          1. (similar to three address code)
          2. one to many mapping, going from a line to canonical form
      2. N-grams
        1. a subgraph of the program dependency graph consisting of all paths of length (n -1) starting from x
    2. Construct the models
      1. abstract syntax trees using V8 engine to generate canonical form
      2. construct program dependence graph, data dependence using def-use chain
      3. construct control flow inspired edges
      4. find common subgraphs in a corpus of (662 million of them)
        1. encode the graph as a string
          1. find a lexically minimal
  5. Results
    1. Code Plagirism
      1. significantly improves the prevision by 10-20%, without losing much recall
      2. However, it is not clear that using program structure has a significantly edge over sequential n-gram based on program syntax
  6. Unique Contributions
    1. A model for extending corpus-driven statistics approaches to programmatic tasks
    2. Mapping n-gram models to program
    3. The two applications
  7. Application
    1. Code Plagiarism detection
      1. improve the precision by removing
    2. Copy paste bugs
This entry was posted in Programming Language. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s