Blockchain-Based PDF File Copyright Protection and
Blockchain-Based PDF File Copyright Protection and
Blockchain-Based PDF File Copyright Protection and
Tracing
Guangyong Gao
Nanjing University of Information Science and Technology
Xinyu Wan
Nanjing University of Information Science and Technology
Chongtao Guo
Nanjing University of Information Science and Technology
Bin Wu ( wubcst@163.com )
Jiujiang University
Research Article
DOI: https://doi.org/10.21203/rs.3.rs-3568563/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Abstract
As network technology advances at a rapid pace, an increasing number of indi-
viduals are utilizing the Internet to transmit and receive information. However,
copying, altering, and other malicious acts constantly threaten the copyright
ownership of electronic documents in this open environment of the Internet.
Numerous copyright theft, illegal transmission, information forgery, and other
problems are emerging. Moreover, because of the absence of copyright protec-
tion methods, PDF file owners often need help to secure the subsequent legal use
of their PDF files after uploading them to the Internet. Combining blockchain
and data hiding of PDF files for the first time, we propose a blockchain-based
copyright protection scheme for PDF files to uniformly handle the problems of
copyright authentication, authenticity identification, and copyright disputes of
PDF files. First, we propose a new information hiding algorithm based on the
incremental update characteristics of PDF, which embeds copyright information
in PDF files. Second, the scheme enables users to establish their access control
policies via smart contracts and delivers on-chain PDF file proofs, which can be
employed to validate the ownership rights of PDF files and track the transaction
history of PDF files. We have implemented the system on the Ethereum platform
and analyzed its security and traceability. The experimental results demonstrate
that the proposed information hiding algorithm possesses robust security and
that the copyright registration method is rapid.
1
Keywords: Blockchain, Copyright protection, PDF Forensics, Smart Contract
1 Introduction
Portable Document Format (PDF) is a page description language developed by Adobe
Systems Society and is considered a development of the PostScript format used to save
documents in the format. The PDF file format allows device-independent text, fonts,
formatting, colors, and graphic images to be encapsulated in a single file, making it
independent of the operating system platform [1]. The format file can also contain
hypertext links, sound and moving images, and other electronic information to support
extra-long documents. Its integration and security, and reliability are higher than other
formats. PDF is widely used, such as letters of authorization, articles, registration
forms, and contracts [2]. However, while they bring great convenience to people, they
also cause problems with copyright disputes caused by illegal piracy and malicious
tampering.
Recent advances in network technology have enabled the rapid publication of dig-
ital media works on the Internet. The exponential growth of multimedia creations has
placed great demands on safeguarding intellectual property rights. Multimedia works
can be duplicated from the original and distributed to many potential consumers via
the Internet. Infringement of digital works is one of the main problems with digi-
tal technologies, which can cause significant harm to the rights of data owners and
diminish their drive to create new works.
The PDF file is susceptible to unauthorized distribution and alteration, as with
other digital files. Copyright protection for images, music, and other media has
advanced significantly. Numerous traditional digital content protection technologies,
such as encryption, digital rights management (DRM), watermarking, and digital fin-
gerprinting, have been designed to protect digital copyrights [3], [4], [5]. However, due
to some PDF properties, the process of copyright protection for PDF files is progressing
slowly. Currently, PDF copyright protection receives less attention, and individuals’
understanding of copyright laws is comparatively restricted. In response to the pre-
vailing issue of PDF infringement in the market, scholars aim to address the existing
challenge of rights preservation by employing technological methods.
However, the traditional copyright content market has problems such as low cred-
ibility in confirming copyright and difficulties in proving the traceability of rights.
The innovation-centred copyright business faces significant challenges due to online
piracy. In recent years, as a novel technology, blockchain development has created
new opportunities to solve the difficulties faced by digital copyright protection. The
concept of blockchain emerged from Bitcoin, which is a decentralized ledger system
that mandates universal participation in maintaining a record of the entire history of
value transactions [6], [7], [8]. The bottom layer of blockchain is a chronological data
blockchain structure, and cryptography and other technologies ensure the security of
each link in the system [9].
2
Blockchain technology can improve the efficiency of digital copyright protection
and propose alternative methodologies for the collecting of evidence, digital asset
trading, and copyright owners’ rights protection [10]. This technique offers heightened
degrees of transparency and decentralization as compared to traditional copyright
protection techniques. A blockchain copyright protection system can rapidly open
the information channel among creators, platform parties, and consumers, allowing
copyright asking parties to contact creators for transactions at any time and in any
location [11]. Blockchain has no upper limit for the number of copyright registration
and transaction, which can meet the large-scale demand for copyright from the cultural
industry that is on the verge of embarking on a trajectory of rapid growth [12].
This paper proposes a blockchain-based copyright protection and tracing scheme
for PDF files, drawing upon the crucial benefits of blockchain technology in the realm
of copyright management. The main contributions of this paper can be summarized
as follows:
1. This paper proposes a novel data hiding method based on the incremental update
for PDF files. The proposed method adheres to the PDF document specification and
embeds extensive information in PDF files without distorting their appearance.
2. It is the first time that blockchain and data hiding of PDF files are combined
to realize permission control and more convincing file tracing via smart contracts.
3. A blockchain-based PDF copyright management system is implemented, which
is built based on Ethereum. The copyright confirmation process is simulated as a way
to analyse the efficacy of the system.
The rest of this paper is organized as follows. Section 2 discusses the related work.
Section 3 illustrates some preliminaries. Section 4 describes the proposed data hiding
method for PDF files, and Section 5 presents the proposed blockchain-based PDF file
copyright protection model. In Section 6, a scenario is designed to facilitate experimen-
tal processing and evaluate performance. Finally, Section 7 gives a short conclusion
and future work.
2 Related Work
2.1 Steganography In PDF
Traditional methods use redundancy in PDF’s body to hide data while focusing on
reducing the visual distortion caused by redundancy modification. Ahmad et al. [13]
presented an invisible watermark approach for PDF files.The approach is founded
uponn a variable quantization index modulation (QIM) method called extended trans-
form dithering modulation (STDM). The hidden message is encoded within a series
of characters, specifically within their x-coordinate values. However, Hatoum et al.
[14] subsequently highlighted the susceptibility of the STDM QIM technique and pro-
posed enhancements to enhance its resistance against principal component analysis
and independent component analysis. Kuribayash et al. [15] introduced a methodol-
ogy that integrates QIM with dither modulation (DM). This method identifies the
most optimal secret key from among the selected keys and embeds the selected keys’
information as side information.
3
In addition to the above methods, St?phane et al. [16] also proposed different
steganography methods based on Chinese residual theorems for PDF files. After
employing the proposed?approach, by releasing the cover PDF file from the unneces-
sary characters of ASCII code A0, the secret message is hidden so that it is not visible
to regular PDF readers. Behrooz et al. [17] provided a way to hide PDF files rely-
ing on aligned text. The hidden data is first compressed by Huffman coding in this
way. Afterwards, several unique cover lines are selected as main lines. The embedding
operation replaces the added space with the main line’s normal spacing. Finally, the
pdf file is regenerated and sent. Kuribayashi et al. [18] proposed a data-hiding scheme
for PDF files by splitting the space values between characters. Apart from the initial
value used for saving the rectified data, the remaining segmented values are encoded
as a segment of the password information.
In summary, traditional methods hide data by modifying the spaces between char-
acters. These methods reduce the document’s visual effect and have a low embedding
capacity. Furthermore, Liu et al. [19] proposed using the incremental update feature
of PDF to embed information. The new objects added in the method are not refer-
enced, which does not comply with the new PDF document specification. The current
version of the PDF reader refuses to access the document modified by the method.
4
a location mapping function are used to improve transaction processing. Recent stud-
ies have also explored using blockchain for audio file copyright protection. Meng et al.
[27] proposed that digital watermarking and perceptual hashing technologies can cre-
ate unalterable hashes within the domain of image copyright protection. Then, these
hashes can be loaded into a blockchain, which can be managed by specific technologies
and protocols to achieve complete protection. Cai et al. [28] combined deep learning
with blockchain to establish a digital system for safeguarding music copyrights and
evaluated its performance.
In summary, additional research is needed on blockchain applications for copy-
right protection, and recent studies have focused on image, digital music, and circuit
copyright. While the current employment of blockchain technology for safeguarding
PDF file copyright is absent, it holds considerable promise in enhancing the efficacy
of copyright protection for such files. Existing applications of blockchain technology
have demonstrated its advantages in distributed verification and data asset storage. It
can also be applied to PDF file copyright protection technology to support copyright
verification of PDF files.
3 PRELIMINARIES
This section describes the structure of PDF, incremental updates, and smart contracts.
5
Header
Body
Cross-reference table
Trailer
Header
Original Body
Initial structure of PDF
file Original
Cross-reference table
Original Trailer
Update Body 1
Incremental Update n
Update Trailer n
Fig. 2 File structure of incremental updated PDF file.
One notable benefit of employing this method for file updates is the expeditious preser-
vation of minor modifications made to extensive documents. The incremental update
will insert all newly created and modified items into the page. These objects comprise
the updated body at the end of the document, followed by a cross-reference table and
a new trailer. The cross-reference table includes details about objects updated in the
body [30]. The structure of the generated file is shown in Figure 2.
6
3.3 Smart Contract
Smart contracts have emerged as a particularly effective implementation of blockchain.
The approved contract terms are converted into an executable computer program.
The program maintains the logical flow of contract clauses by preserving their inter-
connectedness. The recording of each contract statement’s operation is documented as
an unalterable transaction that is preserved on the blockchain [31]. Smart contracts
ensure correct access management and?execution. Developers have the ability to allo-
cate access privileges to specific functions within the contract. Upon the fulfillment
of any condition specified in the smart contract, the associated statement will be exe-
cuted automatically, thereby invoking the corresponding function. They conducted
many online commercial transactions that run on the blockchain system.A smart con-
tract is a computer program that possesses decentralized attributes and is resistant
to tampering. The essential appeal of smart contracts resides in their capacity to
remove the requirement for trusted intermediaries in multi-party transactions. Smart
contracts facilitate the process of decentralized automation by effectively managing,
validating, and executing the predetermined conditions of the underlying protocol [32].
Ethernet is widely recognized as the prevailing blockchain platform for smart contracts
[33]. The platform incorporates a Turing integrity characteristic, which facilitates the
development of smart contracts effectively implemented in real-world scenarios.
7
Step 5: Append a new trailer to the end of the PDF file so that the new trailer
records the new cross-reference table’s placement.
Algorithm 1 displays the pseudo-code for insertData(), which embeds confidential
information into the PDF file.
Algorithm 1 insertData
Input:
pf : aP DF f ile
M : copyrightinf ormation
P K : encryptionkey
Output:
pf ′ : theP DF f ilewithembeddedinf ormation
1. M ′ ← encryptData(M, P K)
/ ∗ Compresscopyrightinf ormation ∗ /
2. DM ← F lateDecode(M ′ )
/ ∗ P arsetheP DF f ile ∗ /
3. cosDocument ← getCOSDocument(pf )
4. x ← getObjectN umber(cosDocument)
5. c ← getCatalog(cosDocument)
6. p ← getP ageT ree(cosDocument)
7. t ← getT railer(cosDocument)
8. obj a ← wirteCatalog(+ + x, c)
9. of f a ← getOf f set(obj a)
10. obj b ← wirteP ages(+ + x, p)
11. of f b ← getOf f set(obj b)
12. obj c ← writeP age(+ + x)
13. of f c ← getOf f set(obj c)
14. s ← generateContentStream(+ + x, DM )
15.obj d ← writeContentStream(x, s)
16.of f d ← getOf f set(obj d)
∗Calculatethecross − ref erencetable ∗ /
17. xref ← calculateXref (of f a, of f b, of f c, of f d)
18. writeXref (xref )
19. of f ← getOf f set(xref )
20. writeT railer(t, of f )
21. pf ′ ← save(cosDocument)
22. close(cosDocument)
23. return pf ′
8
4.2 Extraction
To extract the hidden information, first read the PDF file stream, then iterate through
all the objects in the PDF file stream, read the content stream with generation num-
ber 3, decompress the extracted content stream, and finally decrypt the confidential
information according to the key. In Algorithm 2, the pseudo-code for extractData()
is displayed.
Algorithm 2 extractData
Input:
pf ′ : theP DF f ilewithembeddedinf ormation
SK ′ : decryptionkey
Output:
M : copyrightinf ormation
1. br ← Buf f eredReader(pf ′ )
2. while s ← br.readLine() ̸= null) do
3. if isObject(str) = T rue then
4. if getGenreationN umber(s) = 3 then
5. obj ← br.readLine()
6. if isContentStream(obj) = T rue then
7. str ← getContentStream(obj)
8. M ′ ← Decompress(str)
9. M ← decryptData(M ′ , SK)
10. return M
11. else
12. continue()
13. end if
14. else
15. continue()
16. end if
17. else
18. continue()
19. end if
20. end while
21. return null
Since the file can be reconstructed, the proposed data hiding method is reversible.
The file can be restored by removing incremental updates, including the new cross-
reference table and the new trailer.
9
Fig. 3 PDF permission restriction.
according to the permission function of the PDF file. According to the PDF specifica-
tion standards, a PDF file can use up to two passwords: the user and the owner. PDF
files are encrypted so long as any password is specified. The encryption dictionary
of the security processor retains permissions and information necessary for password
verification.
The owner password can restrict all permissions of the PDF file, such as editing,
copying, adding comments, and other rights. When using the owner password to open
PDF files, users can get full access to PDF files. The user password prevents anyone
without a password from accessing the PDF file. While using the user password to
open PDF files, the user can only get the owner’s unrestricted PDF file permissions.
For example, suppose the owner has set a restriction on modifying the PDF file. In
this case, the user is prohibited from making any changes to the restricted PDF file
without using the owner’s password. Figure 3 illustrates the PDF file property page
with all permissions restricted. Once any password is set for a PDF file, the PDF
reader present users to enter the password when the user tries to read the protected
PDF file. Figure 4 illustrates the use of Adobe Acrobat to open a PDF file set up with
a password.
Currently, many programs restrict PDF file permissions by establishing the PDF
owner password. In Adobe Acrobat, for instance, the PDF owner password is called
the Change Permissions password. This password is known as the PDF permission
password, restriction password, or PDF master password, according to the PDF reader.
Access permissions are provided using flags that correlate to specific operations, such
as printing, document editing, and content copying. When PDF documents have the
10
Fig. 4 PDF permission restriction.
same visual look before and after information concealment, it is difficult for attackers
to detect concealed information. By creating owner and user passwords to restrict PDF
files, copyright holders can conceal copyright information in PDF files and safeguard
confidential information.The user can only have full access to a protected document
if the owner password is appropriately entered.
11
Block 1 Block 2 ... Block n
⑯ ⑧ ④
⑥ ⑩
Web module
⑮ ⑭
① ⑪
② ⑫ User
Owner
Encryption and
decryption module
○
10 . The permission purchaser signs the permission fee data with the private key
of the Ethereum account and sends the transaction to the blockchain.
○
11 . The permission purchaser submits the PDF file to the encryption and
decryption module.
○
12 . The encryption and decryption module returns the decryption key to the
permission purchaser.
○
13 . The user sends the infringed PDF file to the Web module.
○
14 . The Web module sends the infringed PDF file to the encryption and decryption
module.
○
15 . The encryption and decryption module returns the copyright information in
the PDF file to the Web module.
○
16 . The Web module writes the dishonest record of the copyright purchaser to the
blockchain.
12
Register Register
User RC Administrator
Owner
Register copyright
RTC VMC TC
13
2) Registration: In this stage, the entity’s identity is mapped to the Ethernet
address through the smart contract RC. At the same time, users are assigned public
and private key pairs (P Kp, SKp) of the ECC encryption algorithm to encrypt and
decrypt PDF files and (P Km, SKm) to encrypt and decrypt copyright information.
3) Copyright registration: By default, the copyright owner has registered a web
subsystem user account and Ethernet account. The copyright owner has registered a
web subsystem user account and an Ethereum account and has logged in via the web
terminal.
After the copyright owner uploads a PDF file to the encryption module, the encryp-
tion module detects whether the PDF document has copyright information and, if so,
returns an error message. If not, the encryption module will encrypt the PDF docu-
ment with the ECC encryption algorithm’s public key P Kp to publish the copyright
file. Then the module will encrypt the copyright information with the public key Skm
of the ECC encryption algorithm and embed it into the PDF document published.
After the copyright owner confirms the publication of the PDF file, the encrypted PDF
file, key SKp, and key SKm are transferred to the backend of the Web subsystem.
Next, the web module sends a copyright audit fee to the owner.
The owner signs the audit fee with SK, the private key of the Ether account in
the local MetaMask wallet, and sends the transaction to the blockchain. After the
successful transaction, the smart contract RTC will write the audit fee transaction
information to the blockchain. Algorithm 3 displays the copyrightRegist() pseudocode.
4) Copyright transaction: When a user purchases a PDF file, the transaction infor-
mation will be transferred to the web module’s back-end database for storage. The web
module then sends the unsigned purchase fee transaction data to the local MetaMask
wallet.
After the user has signed the data using the private key SK of the MetaMask
wallet, MetaMask submits the transaction to the blockchain to pay the license fee
using the Ether account. The user awaits the system’s answer after a successful pay-
ment while the smart contract RTC writes the user’s transaction information to the
blockchain.
After a successful purchase, the user locates the purchased PDF document on the
web module’s front end and transmits the transaction information to the encryption
and decryption module via the web module. The encryption and decryption module
decrypts an IPFS-stored PDF file. After setting the user password for the PDF file, the
module embeds the transaction information encrypted by the SKm key into the PDF
document. The encryption and decryption module uploads the password-protected
PDF file to the web module. The user then clicks on the front end of the web module
to download the user password and PDF file, completing the copyright transaction
phase. Algorithm 4 displays the copyrightTrade() pseudocode.
5) Check: The user uploads the PDF file to the encryption and decryption module,
which decrypts the PDF file. If the copyright information in the PDF document is
successfully extracted, the copyright information is submitted to the Web module. The
administrator inputs the decrypted copyright information and verifies the legal owner
and author of the PDF file by calling the smart contracts VMC and RTC. Suppose
that the copyright information in the PDF file is not successfully extracted. In this
14
Algorithm 3 copyrightRegist
Input:
pf : aP DF f ile
inf o : copyrightinf ormation
Owner password : ownerpasswordf ortheP DF f ile
P Kp : encryptionkeyf ortheP DF f ile
P Km : encryptionkeyf orcopyrightinf ormation
SKp : decryptionkeyf ortheP DF f ile
SKm : decryptionkeyf orcopyrightinf ormation
Output:
hash : hashvalueof thetransaction
1. upload(pf )
/ ∗ P arsetheP DF f ile ∗ /
2. cosDocument ← getCOSDocument(pf )
3. if hasCopyrightInf o(cosDocument) = T rue then
4. return error
5. else
6. pf ← encryptP DF (pf, Owner password)
7. pf ← insertData(pf, inf o, P Kp)
8. uploadP df Sk(SKp)
9. uploadCopyrightInf oSk(SKm)
10. f ile hash ← uploadF ileT oIpf s(pf )
11. getCopyrightRegistCost()
12. payCopyrightRegistCost
13. hash ← uploadT oBlockchain()
14. return hash
15.end if
case, the title of the PDF file will be extracted, and the smart contract VMC and RC
will be used to search for the legal owner and author of the file. The PDF file is not
confirmed by copyright if no matching title is found. Algorithm 5 displays the verify()
pseudocode.
6) Trace: Verify whether the PDF file has been confirmed by copyright through
the check stage. Suppose the PDF file has been confirmed by copyright. In such a
situation, the PDF file’s title will be extracted through the encryption and decryption
module, and then input the copyright information to TC. TC obtains the historical
transaction records of the PDF file, confirms the dishonest user who leaked the PDF
file according to the records, and writes the user’s dishonest records to the blockchain.
Algorithm 6 displays the trace () pseudocode.
7) Reward and Punish: This stage verifies dishonest users identified during the
trace stage. The transaction cost for dishonest customers grows as their number of
violations increases.
15
Algorithm 4 copyrightTrade
Input:
ID : numberof theP DF f ile
P Km : encryptionkeyf orcopyrightinf ormation
inf o : transactioninf ormation
Output:
pf : aP DF f ile
password : userpasswordof theP DF f ile
1. submitP df Id(ID)
2. h ← getIpf sHash(ID)
3. pf ← downloadIpf sF ile(h)
4. owner pw ← getOwnerP assword(ID)
5. password ← generateP assword(pf )
6. pf ← encryptP DF (owner pw, password)
7. password ← encryptU serP assword(password, P Kp )
8. inf o ← encryptData(inf o, SKm)
9. pf ← insertData(pf, inf o)
10.getCopyrightT radeCost()
11.payCopyrightT radeCost()
12.uploadT oBlockchain()
13.downloadP df AndP assword()
14.return pf, password
16
Algorithm 5 verify
Input:
pf : aP DF f ile
Output:
inf o : copyrightinf ormation
password : userpasswordof theP DF f ile
1. upload(pf )
/ ∗ P arsetheP DF f ile ∗ /
2. cosDocument ← getCOSDocument(pf )
3. if hasCopyrightInf o(cosDocument) = T rue then
4. ID ← getId(cosDocument)
5. author id ← getAuthorIdByF ileId(ID)
6. SKm ← getSKmById(author id)
7. inf o ← extractData(pf, SKm )
8. hash ← getT ransactionByInf o(inf o)
9. chain inf o ← getT ransaction(hash)
10. if inf o.equals(chaini nf o) = T rue then
11. return inf o
12. else
13. return null
14. end if
15.else
16. t ← getT itle(cosDocument)
17. if searchP df AuthorByT itle(t) = T rue then
18. author id ← searchP df AuthorIdByT itle(t)
19. inf o id ← getInf oByAuthorId(authori d)
20. return inf o
21. else
22. return null
23. end if
24.end if
17
Algorithm 6 verify
Input:
pf : aP DF f ile
Output:
history : transactionhistory
1. upload(pf )
2. cosDocument ← getCOSDocument(pf )
3. if hasCopyrightInf o(cosDocument) = T rue then
4. inf o ← getCopyrightInf o(cosDocument)
5. a ← getAuthor(cosDocument)
6. author id ← getIdByN ame(a)
7. SKm ← getSkm(author id)
8. inf o ← Decompress(inf o)
9. inf o ← decrypt(inf o, SKm )
10.else
11. t ← getT itle(cosDocument)
12. if searchP df AuthorByT itle(t) = T rue then
13. author id ← getP df AuthorIdByT itle(t)
14. inf o ← getInf oByAuthorId(author id)
15. else
16. return null
17. end if
18.end if
19.user id ← getU serId(inf o)
20.h ← trace history(inf o)
21.record(user id)
22.uploadT oBlockchain(user id)
23.return h
0.18
0.16
0.14
increase rate [‰]
0.12
0.1
0.08
0.06
0.04
0.02
0
0 1000 2000 3000 4000 5000 6000
file size of original PDF file [kb]
18
0.2
0.18
0.16
Usually, after setting permissions, the file size will increase. To conduct a more
comprehensive analysis of the rate at which file size increases in relation to the hidden
data size, ten new PDF files are downloaded for experimentation. All PDF files are
encrypted (owner password 123456, user password 123) and then embedded with 6400
bytes of data, respectively. The test outcomes plotted in Fig. 8 demonstrate that
although PDF files are encrypted, the growth rate decreases as the file size increases.
It shows that the algorithm also has good security for processing encrypted PDF files.
At the same time, our attention is directed toward the alteration in file size that
occurs while employing various PDF conversion tools. The PDF does not contain any
concealed data, emphasizing that the size will fluctuate depending on the converter
used. First, the same content is converted to PDF files using Microsoft Word, LaTex,
and Adobe PDF libraries, respectively. Then the sizes of the generated PDF files are
recorded in Table I with encryption (set owner password) and without encryption (no
owner password), respectively. The experimental results indicate that although having
identical textual content, there is variation in the size of the resulting PDF file. To give
an instance, the PDF file size generated by LaTex is 28% smaller than that generated
by Microsoft Word. In comparison, the PDF document generated by the Adobe PDF
library is 289% larger than that generated by LaTex. Therefore, using file size as a
determining feature is untrustworthy whether there is any hidden data.
The proposed method is compared with the methods proposed in [15] and [18] by
hiding data on ten different PDF files to compare the increase in file size. The resulting
PDF file size data recorded in Table II shows that for the same file under the same
hidden data, the proposed method exhibits a minimal increase in file size as opposed
to all other methods.
19
Table 2 Comparison of File Size [Bytes] for Different Mehtods
Table 3 Functional Comparison among the Proposed and Conventional Data Hiding Methods
Table III summarizes the capability comparison of the proposed and conventional
methods. The methods proposed in [13] and [15] modify the spaces between characters
to embed information, which inevitably brings visual distortion. On the other hand,
the method proposed in [18] embeds information by splitting the space characters,
hence avoiding any visual distortion. However, the embedding capacity is limited by
space characters in the article. Based on the summary, it is concluded that the pro-
posed method is the most efficient since it has higher security and payload, completely
maintains the file’s visual glance, and slightly increases the file size.
20
3000
2500
2000
time/ms
1500
1000
500
0
1 2 3 4 5 6 7 8 9 10
PDF file
In Ethereum, the cost of natural gas is an essential metric to assess smart contracts.
Table IV shows the cost of gas for deploying four smart contracts. In July 2022, the gas
price is 0.02 ether per million gas. Table V shows the gas and associated storage costs
for invoking the main functions. The expenses associated with contract deployment
and execution are directly correlated with the value of Ether, and, in general, the
proposed method of copyright registration is low-cost.
21
the proprietor of any PDF file. Second, a legitimate purchase transaction is delivered
to any purchaser who correctly executes the purchase procedure. Since the transaction
contains signatures issued by the buyer and the author, the buyer’s legitimacy may
be confirmed by verifying both the transaction and the blockchain. Finally, only users
with the PDF file password can read the PDF file.
2) Confidentiality of PDF files. Each PDF file is encrypted with the owner
password and stored in IPFS. The purchased PDF files are encrypted by the user’s
public key and transmitted over a network. In addition, the user password of the PDF
file is transmitted to the buyer and encrypted by the user’s public key. Therefore, only
legitimate users can read the PDF file.
3) The unforgeability of PDF files. As previously pointed out, because each
PDF file is encrypted with the owner password, viewing and fabricating the PDF file
is unfeasible without possessing the corresponding key. Because uploading PDF files
requires the author to provide copyright-certified transactions via digital signatures,
it is impossible to forge the content of the PDF file stored in IPFS. Therefore, without
knowledge of the author’s private key, it is impossible to falsify the content of published
PDF files.
4) Verifiability of PDF files. Every PDF document has a published hash value
corresponding to the original content, and the copyright information of the PDF’s cre-
ator is digitally signed. Through the signature, the user can authenticate the identity
of the book’s author. When the user has downloaded all encrypted PDF file fragments
from IPFS, the user ensures the authenticity of the file’s content by comparing the
downloaded content hash value to the original content hash value.
5) Verifiability of the transaction. As mentioned above, PDF file transactions
contain digital signatures of the user and owner, and the transactions remain in the
chain for eternity. Therefore, due to the nature of the blockchain, transactions can
always be verified using the public keys of both parties.
6) Tracking of PDF files. Since the transaction information is hidden in the
PDF file of each transaction, the copyright holder can quickly identify the user who
leaked the PDF file through this information. In addition, the smart contract TC is
also stored in Ethernet to query transaction records of PDF files, making the tracking
more convincing.
7 Conclusion
In this work, considering the copyright dispute and PDF file tracking problems in PDF
file forensic scenarios, this study presents a novel approach for concealing data within
PDF files while maintaining the visual integrity of the initial papers. The increase in
file size following data concealment is a noteworthy observation. However, the file size
analysis in PDF files is complicated due to the wide range of descriptions available,
as many PDF converters produce PDF files of varying sizes. Four smart contracts are
constructed to implement PDF file copyright registration, transaction, and tracking.
The proposed scheme is implemented on a test chain and compared with other schemes.
Comparative outcomes demonstrate that our system possesses the capacity to perform
identity authentication, copyright transaction, PDF file protection, and traceability.
22
Currently, only PDF files can be protected, but research will continue to expand the
scheme to protect more types of text files. In addition, the similarity of PDF files will
also be further evaluated.
Acknowledgments. The authors acknowledge the anonymous reviewer’s insightful
comments and suggestions.
Declarations
Ethics Approval. Not applicable
Conflict of Interest. The authors declare no conflict of interest.
Data Availability. Confidential.
Author Contribution. Guangyong Gao is responsible for text algorithm design
and blockchain modeling. Xinyu Wan is responsible for paper writing and smart con-
tract design and deployment BIN Wu is in charge of experimental data and paper
review. Chongtao Guo assists in the completion of the experiment
Funding. This work was supported in part by the Nature Science Foundation of
China under Grant 62262033, in part by the Science and Technology Research Project
of Jiangxi Education Department under Grant GJJ211815, and in part by the Jiangxi
Key Natural Science Foundation under Grant 20192ACBL20031, and in part by
Graduate Scientific Research Innovation Program of Jiangsu Province under Grant
SJCX23 0400.
Consent to publish. Authors are willing to publish the manuscript.
References
[1] Zhong, S., Cheng, X., Chen, T.: Data hiding in a kind of pdf texts for secret
communication. International Journal of Network Security 4(1), 17–26 (2007)
[2] Nursiah, N., Wong, K., Kuribayashi, M.: Reversible data hiding in pdf document
exploiting prefix zeros in glyph coordinates. In: 2019 Asia-Pacific Signal and Infor-
mation Processing Association Annual Summit and Conference (APSIPA ASC),
pp. 1298–1302 (2019)
[3] Lu, Z., Shi, Y., Tao, R., Zhang, Z.: Blockchain for digital rights manage-
ment of design works. In: IEICE Transactions on Fundamentals of Electronics,
Communications and Computer Sciences, pp. 596–603 (2019)
[4] Dong, P., Brankov, J.G., Galatsanos, N.P., Yang, Y., Davoine, F.: Digital water-
marking robust to geometric distortions. IEEE Transactions on Image Processing
14(12), 2140–2150 (2005)
[5] Yagi, H., Matsushima, T., Hirasawa, S.: Fingerprinting codes for multimedia
data against averaging attack. IEICE transactions on fundamentals of electronics,
communications and computer sciences 92(1), 207–216 (2009)
23
[6] Cai, C., Zheng, Y., Du, Y., Qin, Z., Wang, C.: Towards private, robust, and
verifiable crowdsensing systems via public blockchains. IEEE Transactions on
Dependable and Secure Computing 18(4), 1893–1907 (2019)
[7] Zou, R., Lv, X., Wang, B.: Blockchain-based photo forensics with permissible
transformations. Computers & Security 87, 101567 (2019)
[8] Xiao, Y., Zhang, P., Liu, Y.: Secure and efficient multi-signature schemes for
fabric: An enterprise blockchain platform. IEEE Transactions on Information
Forensics and Security 16, 1782–1794 (2020)
[9] Zhang, J., Cheng, Y., Deng, X., Wang, B., Xie, J., Yang, Y., Zhang, M.: A
reputation-based mechanism for transaction processing in blockchain systems.
IEEE Transactions on Computers 71(10), 2423–2434 (2021)
[10] Ahmadjee, S., Mera-Gómez, C., Bahsoon, R., Kazman, R.: A study on blockchain
architecture design decisions and their security attacks and threats. ACM
Transactions on Software Engineering and Methodology 31(2), 1–45 (2022)
[11] Xu, X., Rahman, F., Shakya, B., Vassilev, A., Forte, D., Tehranipoor, M.: Elec-
tronics supply chain integrity enabled by blockchain. ACM Transactions on
Design Automation of Electronic Systems 24(3), 1–25 (2019)
[12] Zhu, P., Hu, J., Li, X., Zhu, Q.: Using blockchain technology to enhance the trace-
ability of original achievements. IEEE Transactions on Engineering Management
(2021)
[13] Bitar, A.W., Darazi, R., Couchot, J.-F., Couturier, R.: Blind digital watermarking
in pdf documents using spread transform dither modulation. Multimedia Tools
and Applications 76, 143–161 (2017)
[14] Hatoum, M., Darazi, R., Couchot, J.: Blind pdf document watermarking robust
against pca and ica attacks. In: International Conference on E-business and
Telecommunications (2018)
[15] Kuribayashi, M., Wong, K.: Improved dm-qim watermarking scheme for pdf doc-
ument. In: Digital Forensics and Watermarking: 18th International Workshop,
IWDW 2019, Chengdu, China, November 2–4, 2019, Revised Selected Papers 18,
pp. 171–183 (2020)
[16] Ekodeck, S.G.R., Ndoundam, R.: Pdf steganography based on chinese remainder
theorem. Journal of Information Security and Applications 29, 1–15 (2016)
[17] Khosravi, B., Khosravi, B., Khosravi, B., Nazarkardeh, K.: A new method for pdf
steganography in justified texts. Journal of Information Security and Applications
45, 61–70 (2019)
24
[18] Kuribayashi, M., Wong, K.: Stealthpdf: Data hiding method for pdf file with no
visual degradation. Journal of Information Security and Applications 61, 102875
(2021)
[19] Liu, H., Li, L., Li, J., Huang, J.: Three novel algorithms for hiding data in pdf
files based on incremental updates. In: Digital Forensics and Watermarking: 10th
International Workshop, IWDW 2011, Atlantic City, NY, October 23-26, 2011,
Revised Selected Papers 10, pp. 167–180 (2012)
[20] Yang, Y., Guan, Z., Wan, Z., Weng, J., Pang, H.H., Deng, R.H.: Priscore:
blockchain-based self-tallying election system supporting score voting. IEEE
Transactions on Information Forensics and Security 16, 4705–4720 (2021)
[21] Xu, Y., Zhang, C., Zeng, Q., Wang, G., Ren, J., Zhang, Y.: Blockchain-enabled
accountability mechanism against information leakage in vertical industry ser-
vices. IEEE Transactions on Network Science and Engineering 8(2), 1202–1213
(2020)
[22] Wang, B., Jiawei, S., Wang, W., Zhao, P.: Image copyright protection based
on blockchain and zero-watermark. IEEE Transactions on Network Science and
Engineering 9(4), 2188–2199 (2022)
[23] Wamba, S.F., Queiroz, M.M., Trinchera, L.: Dynamics between blockchain adop-
tion determinants and supply chain performance: An empirical investigation.
International Journal of Production Economics 229, 107791 (2020)
[24] Kumar, R., Tripathi, R., Marchang, N., Srivastava, G., Gadekallu, T.R., Xiong,
N.N.: A secured distributed detection system based on ipfs and blockchain for
industrial image and video data security. Journal of Parallel and Distributed
Computing 152, 128–143 (2021)
[25] Lee, N.-Y., Yang, J., Kim, C.-S.: Blockchain-based smart propertization of digital
content for intellectual rights protection. Electronics 10(12), 1387 (2021)
[26] Xiao, L., Huang, W., Xie, Y., Xiao, W., Li, K.-C.: A blockchain-based traceable
ip copyright protection algorithm. IEEE Access 8, 49532–49542 (2020)
[27] Meng, Z., Morizumi, T., Miyata, S., Kinoshita, H.: Design scheme of copyright
management system based on digital watermarking and blockchain. In: 2018 IEEE
42nd Annual Computer Software and Applications Conference (COMPSAC), vol.
2, pp. 359–364 (2018)
[28] Cai, Z.: Usage of deep learning and blockchain in compilation and copyright
protection of digital music. IEEE Access 8, 164144–164154 (2020)
[29] Hatoum, M.W., Darazi, R., Couchot, J.-F.: Normalized blind stdm watermark-
ing scheme for images and pdf documents robust against fixed gain attack.
25
Multimedia Tools and Applications 79, 1887–1919 (2020)
[30] Wang, X.-y., Zhang, S.-y., Wen, T.-t., Xu, H., Yang, H.-y.: Synchronization
correction-based robust digital image watermarking approach using bessel k-form
pdf. Pattern Analysis and Applications 23, 933–951 (2020)
[31] Qian, P., Liu, Z., Wang, X., Chen, J., Wang, B., Zimmermann, R.: Digital resource
rights confirmation and infringement tracking based on smart contracts. In: 2019
IEEE 6th International Conference on Cloud Computing and Intelligence Systems
(CCIS), pp. 62–67 (2019). IEEE
[32] Qureshi, A., Megı́as Jiménez, D.: Blockchain-based multimedia content protec-
tion: Review and open challenges. Applied Sciences 11(1), 1 (2020)
[33] Delmolino, K., Arnett, M., Kosba, A., Miller, A., Shi, E.: Step by step towards
creating a safe smart contract: Lessons and insights from a cryptocurrency lab. In:
Financial Cryptography and Data Security: FC 2016 International Workshops,
BITCOIN, VOTING, and WAHC, Christ Church, Barbados, February 26, 2016,
Revised Selected Papers 20, pp. 79–94 (2016). Springer
26
Chongtao Guo received his BE degree in computer science and
technology from Yancheng Institute of Technology in 2019, China.
He is currently pursuing his ME de-gree in electronic information
at the College of Computer Science, in Nanjing University of Infor-
mation Science & Technology, China. His research interest is image
copyright protection based on blockchain.
27