Files
Telereview/code/Traitement Langage Naturel/TALnegation.ipynb
2023-02-23 09:37:06 +00:00

301 lines
8.8 KiB
Plaintext

{
"cells": [
{
"cell_type": "code",
"execution_count": 163,
"metadata": {},
"outputs": [],
"source": [
#pip install spacy
#!python -m spacy download fr_core_news_sm
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Première partie : presentation du problème et du materiel\n",
"\n",
"Nous cherchons a attribuer à une liste d'avis laissés un score global de satisfaction, ainsi qu'un score de satisfaction concernant chaque point pour lequel il sera particulierement interessant de se pencher (par exemple le delais d'attente dans un parc d'attraction ou la propreté dans un hotel).\n",
"\n",
"Nous allons pour cela utiliser une base de mots français associés chacun a un score de positivité, ainsi qu'une liste d'avis concernant le musée du Louvre.\n"
]
},
{
"cell_type": "code",
"execution_count": 164,
"metadata": {},
"outputs": [],
"source": [
"#Emplacmement du fichier contenant des mots francais associés a une score sous la forme\n",
"#mot1->son score\n",
"#mot2->son score\n",
"#mot3->son score ...\n",
"\n",
"lexiconPath = r\"fr_lexicon.txt\" \n",
"\n",
"\n",
"#Emplacmement du fichier contenant des des avis sur le musée du Louvre sous la forme\n",
"#Avis1\n",
"#//Avis2\n",
"#//Avis3 ...\n",
"\n",
"reviewPath = r\"LouvreAvis.txt\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nous créons une liste de listes ordonnée alphabétiquement pour ne pas avoir à chercher un mot d'un avis dans le lexique en entier à chaque fois. La dernière case correspond aux expressions n'étant pas des mots."
]
},
{
"cell_type": "code",
"execution_count": 165,
"metadata": {},
"outputs": [],
"source": [
"scoreWords = open(lexiconPath, \"r\")\n",
"scoreTable = {}\n",
"line = scoreWords.readline()\n",
"\n",
" \n",
"#Ajout des paires mot-score dans scoreTable\n",
"while (line != ''):\n",
" line = line.strip().split(\"->\")\n",
" scoreTable[line[0].lower()]= float(line[1])\n",
" line = scoreWords.readline()\n",
"scoreWords.close()\n"
]
},
{
"cell_type": "code",
"execution_count": 166,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.4 \n",
"\n",
"-2.0 \n",
"\n",
"0.2 \n",
"\n"
]
}
],
"source": [
"print(scoreTable[\"top\"],'\\n')\n",
"print(scoreTable[\"jaloux\"],'\\n')\n",
"print(scoreTable[\"bien\"],'\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deuxieme partue : analyse d'avis"
]
},
{
"cell_type": "code",
"execution_count": 167,
"metadata": {},
"outputs": [],
"source": [
"file = open(reviewPath, \"r\")\n",
"reviews = (file.read()).split('//')\n",
" \n",
"#liste (partielle) de mots-clé pertinents pour un musée\n",
"keys=['attente', \"d'attente\", 'queue', 'patienter', 'patience', 'patient',\n",
" 'patients', 'patiente', 'patientes',\n",
" 'impolitesse' ,'impolie', 'impolies', 'impoli', 'impolis',\n",
" 'gentillesse', 'amabilité', 'aimable', 'aimables','gentil', 'gentils',\n",
" 'gentille', 'gentilles', 'personnel',\n",
" 'sales', 'sale', 'saleté', 'propre', 'propres', 'propreté',\n",
" 'acceuil', 'prix', 'cher', 'chers', 'chère', 'chères',\n",
" 'onéreux', 'onéreuse', 'onéreuses', 'abordable',\n",
" 'raisonnable', 'raisonnables', 'accessible', 'accessibilité', 'orienter','employé',\n",
" 'employés', 'employées', 'employée',\n",
" 'orientation', 'orienté', \"s'orienter\",\n",
" 'désorienter', 'désorienté', 'désorientée', 'désorientés', 'désorientées',\n",
" 'panneau', 'panneaux', 'signalétique', 'labyrinthe',\n",
" 'perdu', 'perdus', 'perdue', 'perdues']"
]
},
{
"cell_type": "code",
"execution_count": 168,
"metadata": {},
"outputs": [],
"source": [
"import spacy\n",
"nlp = spacy.load(\"fr_core_news_sm\")"
]
},
{
"cell_type": "code",
"execution_count": 169,
"metadata": {},
"outputs": [],
"source": [
"#Score moyen d'un avis\n",
"averageScore = 0\n",
"#Tableau de paires mots-clé, score associé\n",
"keyWords = []\n",
"\n",
"ListNegation=set()\n",
"ListNegation.add(\"pas\")\n",
"\n",
"def addKeys(token,score,review):\n",
" for key in keys:\n",
" if (any(token.text == key for token in review)):\n",
" cles = list(e[0] for e in keyWords)\n",
" if (key in cles):\n",
" keyWords[cles.index(key)][1] += score\n",
" else:\n",
" keyWords.append([key, score])\n",
" miniKey.append(key)\n",
"\n",
"def analyseOneWord(token,negation):\n",
" # print(\"analyse du mot «\",token,\" ». Une negation concerne ce mot :\",negation)\n",
" score = scoreTable.get(token.text.lower(),0)\n",
" if(negation):\n",
" score=-score\n",
" #print(\"score ajouté\",score,'\\n')\n",
" return score\n",
"\n",
"def getAllDependancy(token, doc):\n",
" listOfDep = set()\n",
" listOfDep.add(token.head)\n",
" for child in token.children:\n",
" listOfDep.add(child)\n",
" for child in token.head.children:\n",
" listOfDep.add(child)\n",
" return listOfDep \n",
"\n",
"def analyseOneReview(Review):\n",
" reviewScore = 0\n",
" review = nlp(Review)\n",
" dejaTraites={}\n",
" for token in review:\n",
" dejaTraites[token]=False\n",
" for token in review:\n",
" if token.text.lower() in ListNegation:\n",
" dejaTraites[token]=True\n",
" negToken=getAllDependancy(token, review)\n",
" for token in negToken:\n",
" if not dejaTraites[token]:\n",
" dejaTraites[token]=True\n",
" scoreToken=analyseOneWord(token,True)\n",
" reviewScore=reviewScore+scoreToken\n",
" addKeys(token,scoreToken,review)\n",
" for token in review :\n",
" if not dejaTraites[token]:\n",
" dejaTraites[token]=True\n",
" scoreToken=analyseOneWord(token,False)\n",
" reviewScore=reviewScore+scoreToken \n",
" addKeys(token,scoreToken,review)\n",
" return reviewScore"
]
},
{
"cell_type": "code",
"execution_count": 170,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-2.0\n"
]
}
],
"source": [
"print(analyseOneReview(\"Il est beau mais il n'est pas heureux\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Troisiere partie : affichage des resultats"
]
},
{
"cell_type": "code",
"execution_count": 171,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Format: [[Mot-clé, score associé]]\n",
"\n",
"[['cher', 1.3], ['orientation', 1.3], ['abordable', 2.8000000000000003], ['personnel', 2.5000000000000004], ['orienter', 5.3], ['queue', 4.500000000000001], ['attente', 13.2], ['prix', 7.4], ['raisonnable', 7.4]] \n",
"\n",
"Nombre d'avis: 23 \n",
"\n",
"Score moyen d'un avis: 2.4130434782608696\n"
]
}
],
"source": [
"averageScore=0\n",
"\n",
"for Review in reviews:\n",
" #print(Review)\n",
" miniKey = []\n",
" \n",
" #recherche de mots positifs/négatifs\n",
" scoreRev=analyseOneReview(Review)\n",
" \n",
" #print(scoreRev)\n",
" averageScore = averageScore +scoreRev\n",
" \n",
" #Caractéristique de l'avis analysé\n",
" #miniKey = set(miniKey)\n",
"\n",
"averageScore /= len(reviews)\n",
"print(\"Format: [[Mot-clé, score associé]]\\n\")\n",
"print(keyWords,'\\n')\n",
"print(\"Nombre d'avis: \", len(reviews),'\\n')\n",
"print(\"Score moyen d'un avis: \", averageScore)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}