This is the official project of paper: [ACL 2025] SHARE: Shared Memory-Aware Open-Domain Long-Term Dialogue Dataset Constructed from Movie Script
SHARE is a novel long-term dialogue dataset constructed from movie scripts, designed to enhance conversations by leveraging shared memories between individuals. It includes persona information, event summaries, and both explicit and implicit shared memories to enrich dialogue engagement. Additionally, we propose EPISODE, a dialogue framework that utilizes these shared experiences to make long-term conversations more engaging and sustainable.
You can download this dataset directly from Hugging Face:
👉 https://huggingface.co/datasets/eunwoneunwon/SHARE
mkdir -p datasets/
cd datasets/
git lfs install
git clone https://huggingface.co/datasets/eunwoneunwon/SHARE
You can explore the SHARE dataset, which is organized into train
, validation
, and test
splits under the data/
directory.
Each split is stored as a separate JSON file:
data/
└── train.json
└── valid.json
└── test.json
Below is a sample from the valid.json
split of the SHARE dataset:
{
"session": 3,
"dialogues": [
{
"speaker": "BERADA",
"text": "I got you another six months. I told them it takes time.",
"label": [
"BERADA has ensured an extension of six months for the operation."
]
},
{
"speaker": "DONNIE",
"text": "Same budget?",
"label": [
"DONNIE and BERADA share past interactions concerning the operation, which involves managing a budget for an ongoing operation."
]
},
{
"speaker": "BERADA",
"text": "Same budget. Look, Joe, not that I don't see any movement, but--do you see any movement? I got my neck out on this.",
"label": [
"BERADA is responsible for managing the operation and feels pressure due to a lack of visible progress."
]
},
{
"speaker": "DONNIE",
"text": "Whatever it takes, I'm gonna get these bastards.",
"label": [
"DONNIE is dedicated to his mission and willing to do whatever it takes."
]
}
]
}
Below are the Hugging Face model links used in this project. You can easily access and download them by clicking on the model names.
The implementation of the generation, selection, extraction, and update models in this project is inspired by the following repository:
Special thanks to the authors for providing an excellent foundation for our work.
Clone this repository and install the required packages:
conda env create -f environment.yml
conda activate share
This section describes the code for creating an evaluation dataset.
- Run
automatic_eval.sh
inside theupdate_task
folder. - The entire system's modules will execute, generating the final response.
For more details about automatic evaluation, refer to the eval/automatic_evaluation
folder.
This section explains the code for multi-session evaluation.
- Run
multi_session_eval.sh
inside theupdate_task
folder. - Then, execute
multi_eval.sh
in theevaluation/eval
folder to perform GPT evaluation.
Note: When running GPT evaluation, make sure to set the OPENAI_API_KEY
environment variable.
This section provides instructions for EPISODE evaluation.
- Run
multi_session_eval.sh
inside theupdate_task
folder. - Then, execute
episode_share.sh
in theevaluation/eval
folder to perform GPT evaluation. Note: When running GPT evaluation, make sure to set theOPENAI_API_KEY
environment variable.
If you use this project in your research, please cite it as follows: