Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Current »

The End goal of this project to to help people learn ( spanish) words from TV series subtitle using Quizlet (or other flash card project ):




1- Use Github

2- documentation ( How-to use )

3- Use OpenSource/freeware/shared script/program when possible

4- This project will have many phases:   

phase 1- python script  ( create a word list and translate ) [  + Create an dictionary per language / per season = summ of all word list    — for future application / statistics 
phase 2- website: subtitle search, check if already exist and create a Quizlet 
phase 3- script apply to many subtitle languages / TV series with many episode 
phase 4- graph on the website ( statistics of words usage per single subtitle files and multiple subtitle files )  
phase 5- Integrate learning progress from Quizlet ( or other ) to create new list of words/ per subtitle files


Today python script:  Phase 1

1- srt substitute file >>> remove the time stamps create 2 files : list of text subtitle + list of "description information" example 

2- from the "subtitles words only "

1- count all the words  ( summary ) 

2- count only the unique words ( provide stat of words / subtitle text + unique words summary )

3- More advance: count only the same word for: masculin/feminim, also "la + noums" /"el + noums" = 1 words ,  various tense verbs = 1 words 

3- create a list + translation

1- translate the words, order based on the count per subtitle

2- more advance: 

+ add the letter "V"  ( verb)  and add the infinitive form of the verb 

+ add the letter "N" ( noums ) ,.... >>> for the Noums add " el" or "la"

+ may be other . . . ???? 


2- documentation ( How-to use )

usage will like something like that:

wordlistandwordtranslate.py   <TV_serie_S0xE0x.srt>   -h --help


Options:

-o  <serieS01E01.csv> 
-d <description_file.csv> 
-e  translate expression as one work:  "Qué tal"  and not "Qué" "tal"
--dbo   <other output format, database format, to be defile in the future >

--aato <add article to nouns to origin language: + el, la >
--aivto <add infinite verb form to origin language, format "<SINGLE SPACE> INFINITE_VERB_FORM" > 
--gsv <group same verb >  
--dwi  <display word_info: N for Nouns and V for Verb >
--oon  output only the nouns  
--oov  output only the verbs
--ootr  output only the rest

--ol <original_language: esp > 
--tl <target_language: eng> 
--st <second language: romanization or transliteration : PinYin , .... >
--ra <remove article: el, la, una, ... >

--bstat           Basic statistics : only summary of count of words
--fstat            Full statistics report: with everything from below 
--swstat        Single words statistics 
--swnastat    Single words no articles statistic ???? is it needed ????
--swonstat    Single words  only nouns statistic
--swovstat     Single words  only verbs statistic
--swotrstat     Single words  only the rest statistic
--swiestat      Sinle words including expression ( like "Qué tal" ) statistics

--mfl        Multi-file option:  list of file >
--mfd       Multi-file directory >
--mfso     Multi-file single output file .csv
--mfmo    Multi-file multiple output file .csv
--mstat    Multi-file stats >



Future:

1- create an website to search / download subtitle file ( srt )

2- run the precedent script

3- create an Quizlet ( https://quizlet.com/ )

3- compare list of file >>> define the delta/diff  >> create only the new words 




  • No labels