Classroom Practices

What is Similarity Percentage All About and How Much is Allowed for Academic Papers?

Pinterest LinkedIn Tumblr

Clarity. This is what gives a sense of understanding. This is something that gives directions to move on. When it comes to plagiarism prevention tools, you may want exactly the same, and for a good reason.

Knowing what percentage of similarity could be considered plagiarism would make it a lot easier. You could quickly pick up the papers that need further investigation and invite students to have a dialogue.

So, is there any set amount of duplication that can be seen as too high for academic papers? How does plagiarism differ from similarity? And what are the acceptable similarity scores?

Here are some of our thoughts and findings.

The Magic Behind a Similarity Percentage

What people love most about similarity checkers is that they deliver a similarity percentage once a submission has been compared against the target source(s). These can include web pages, books, journals, open-access databases, private repositories, consortia libraries, and more.

As for Unicheck, the algorithm runs checks across a real-time web index and the institution’s repository. To find matching sources, it divides each sentence into small silos. Each of them is then run through the search engine. If a duplicate is discovered, Unicheck will verify the source to ensure it’s worth adding to the report.

The total similarity score (TSS) is then calculated using this formula:

Similarity Score FormulaThe color of the score displayed in the similarity report will largely depend on the “density” of the matches found and the total word count of the paper:

Similarity Percentage RangesAt first glance, the red score seems to be the most severe one… However, many times this may be misleading.

Red Triggers a Fight-or-Flight Response

Seeing red-flagged submissions, instructors and students are often in a hurry to view them as clear indicators of potential plagiarism.

Unicheck’s recently rolled-out analytics says the reports with red flags usually have the highest open rates.

It comes as no surprise. Historically, red has been used to symbolize alarm. People subconsciously take red flags for something that requires attention. Other colors do not produce such an effect. So, instructors and students may skip other similarity scores thinking that they signal only minor overlaps. But this isn’t as straightforward as it may seem.

Similarity Embraces So Many Things

The paper’s similarity percentage indicates how much similarity has been found after it’s been run against the databases and other sources a checker can access.

The similarities themselves may vary. Unicheck distinguishes between:

  • Matches – the text found in the institution’s repository or internet
  • Quotes – correctly formatted direct/indirect quotes
  • References – a works cited list provided at the end of the paper

Thus, matches refer to text overlaps and, therefore, influence the similarity percentage. Quotes and references, if formatted properly, have no impact on the score, as Unicheck will exclude them from search results by default.

Still, the score shouldn’t be translated directly into grades. Here’s why.

What May Impact a Similarity Score

There are always two sides of one coin. Down below, we’ve listed several examples for high and low scores. Nuances are many, so proceed to get a better understanding of them:

Does a high percentage mean plagiarism has occurred?

High percentages do inform educators about struggling students. Yet, the similarity score may increase due to disabled search filters, incorrect formatting of quotes and references, or templates used across all the submissions. Let’s consider each case:

●  The filter excluding quotes and references is off

If quotes and references aren’t excluded from search results, they’ll contribute to the overall similarity score, and the score will be high. Just make sure it is enabled for every assignment in the course, either through the assignment settings or the report. This will spare you the trouble of excluding them manually.

●  A new submission overlaps the previous one written by the same student

In such a case, you would get the highest percentage of similarity. Draft submissions should be automatically excluded. However, if your integration doesn’t support the exclusion of drafts, you may need to do it manually. This will adjust the total similarity score and help you focus on matches only.

●  Quotes highlighted as borrowed text

Formatting quotes is a tedious task. If students fail to follow citation guidelines, Unicheck won’t recognize the quote and will consider it to be a match. The solution? They should be excluded manually to decrease the score and discussed with a student.

●  A template/assignment brief increases the similarity index

Assignment briefs are often included in submissions. This would inevitably lead to massive overlapping and increase the similarity percentage. Unicheck’s Ignore Text feature allows skipping these passages. The moment you do it, Unicheck will update the word count and stop viewing templates/assignment briefs as matches.

●  Matches that turned out to be common phrases/common knowledge

Oftentimes, terminology, descriptions, common knowledge are included in the matching list. If used by everyone enrolled in a course, these chunks of text will coincide and, as a result, increase the total similarity score. Obviously, these matches have nothing to do with borrowed text and should be excluded for all papers.

Can a 0-20% score be acceptable for all academic papers?

At Unicheck, we recommend double-checking reports even if a low similarity score shows up. Situations differ from one submission to another. That’s why similarity checkers do not provide any instructions on ways of interpreting the similarity percentage. Here are two more cases to support our point:

●  A tiny match might be a serious issue

Checking out micro matches is worth the effort. Unicheck may discover a small text overlap with the source that was massively paraphrased. This is when a trivial match can result in a bigger issue. The Compare Mode in Unicheck will come in handy. The submission will be displayed next to the matching source text, making it easier for you to navigate the matches.

●  A paper may have a low similarity score due to text modifications

Imagine two papers have the same percentage of similarity, below the 10% mark. Is it worth examining each? Definitely. One may have a few copy-pasted phrases or wrongly quoted words. The other one may contain some modified text that would minimize the real score. If modifications are detected, you’ll be notified about suspicious text changes and offered to open the Modifind tab for further details.

Similarity Percentage Allowed in Colleges

Long story short, the checker does mainly two things for an educator. It calculates the number of overlaps and shows where they come from.

Dr. Thomas Lancaster commented on the similarity checker purpose this way:

Thomas Lancaster on Similarity Scores
Retrieved from a discussion on Quora

Unicheck is just a tool for investigation. It enables instructors to examine the paper in more detail and make informed decisions. For students, it can play a guiding role, showing what needs improvement and what is done correctly.

Another question that frequently arises: “What percentage of similarity is considered plagiarism?”

You may stumble upon discussions that state 15-20% is the top score allowed in academia. However, this may not correlate with your internal institution policies or requirements set for a particular course. Here’s an opinion shared by a researcher, Master of Philosophy, Tudor Georgescu:

Acceptable Similarity Scores
Retrieved from a discussion on Quora

Given all the above, no vendor can ever provide a similarity score so precise that it would make alarm bells go off.

An acceptable similarity percentage may differ from one assignment to another. For computer science, high similarity scores may be quite normal due to the many templates/patterns used. In contrast, high percentages could signal unoriginality for linguistics. It’s up to the institution to decide what is allowed and how flexible the score can be for a course and assignment.

Key Takeaways:

Many times, some level of similarity is essential to support the statements made in papers. For instructors and students to be on the same page and draw maximum advantage out of a similarity report, we’ve compiled this checklist:

  • Making it clear to students which similarity scores are tolerated
  • Checking settings for the assignment and ensuring a filter for quotes and references is on
  • Going over all the highlighted sentences in the report and excluding them if those are wrongly cited words/common phrases
  • Getting a final similarity score adjusted
  • Comparing the amount of cited and original material contributed
  • Making sure the text covers all the requirements set in the assignment brief
  • Making a final decision on a grade

Write A Comment