Blog - Basic usuage of python

jupyter 中的魔法函数

魔法函数：IPython中预先定义好的具备特定功能的函数被放入Jupyter中使用。

魔法函数是Ipython中特有的函数，并非Python内置函数，在其他环境就不行了。

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
x = np.random.rand(10)
y = np.random.rand(10)
colors = np.random.rand(10)
%matplotlib qt
plt.scatter(x,y,c = colors,alpha=0.7)
#plt.show()

<matplotlib.collections.PathCollection at 0x12d21e580>

魔法函数一种是面向行的(line magic)，以一个百分号(%)开头，其作用范围就是这个魔法函数当前行，

`%lsmagic`

可用于查询jupyter中的各种魔法函数。

%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

%matplotlib inline 函数

inline模式，就是告诉Ipython绘图直接显示在当前网页中，plt.show()可省略。但其out中会多出关于该output的adress

inline之后有一个对标参数qt，使用这个，就可以使得代码构造的图形是独立窗口显示的。可实现拖拽等。

`%timeit` 函数

提供某行代码等执行计时服务，评估机器学习算法的性能有用

%timeit area = (40*np.random.rand(20))**2

2.13 µs ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

translation：对于代码area = (40*np.random.rand(20))**2运行7轮，每轮运行100000个循环，均值6.6微秒

改变循环次：添加参数

%timeit -n100 area = (40*np.random.rand(20))**2

The slowest run took 4.02 times longer than the fastest. This could mean that an intermediate result is being cached.
25 µs ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

`%%writefile`函数

Jupyter代码块中的代码一块一块的复制到IDE中，保存为Python源文件。 %%writefile就是做这个的。通过该魔法函数的前面两个就是一个整个代码块作为作用域，

! pwd

/Users/a182501/quarto/program-chunk/posts

shell命令：命令行和计算机进行文本交互的命令方式，Jupyter中之遥加上一个!就可以执行shell命令

files = !ls
print(files)

['1.csv', '111.Rmd', '123.csv', '3.py', '6 linear reg.Rmd', 'Applications', 'CAPM_python_realization.ipynb', 'Calibre Library', 'DIR', 'Desktop', 'Desktop.pem', 'Documents', 'Downloads', 'E:\\工作簿1.csv', 'Figure_1.png', 'Julia入门.ipynb', 'LAB$.docx', 'LAB$.tex', 'LAB4.R', 'Library', 'Movies', 'Music', 'Nutstore Files', 'NutstoreCloudBridge', 'Pictures', 'PycharmProjects', 'RevMan tutorial', 'R语言进阶步骤.ipynb', 'Sites', 'U.tex', 'Untitled-1.py', 'Untitled-1.tex', 'Untitled-2.bcf', 'Untitled-2.log', 'Untitled-2.pdf', 'Untitled-2.tex', 'Untitled.R', 'Untitled.Rmd', 'Untitled.docx', 'Untitled.ipynb', 'Untitled.pdf', 'Untitled0702.tex', 'Untitled1.ipynb', 'Untitled2.Rmd', 'Untitled2.ipynb', 'Zotero', 'a182501.Rproj', 'abc.yml', 'abc2.yml', 'airbnb_python_kaggle.ipynb', 'bernoulli+bayes+likehood.R', 'bin', 'blog', 'cancer.csv', 'cd', 'ciation.json', 'citespace', 'citespace.projects.txt', 'class.csv', 'cm2_mod.R', 'cm_mod.R', 'colors-bar.pdf', 'consumption1.pdf', 'consumption1.tex', 'cv.docx', 'd:pythonpython37libsite-packages', 'data', 'data1964al.xy', 'data\\导出5.csv', 'datas', 'diabetes.csv', 'die_mod.R', 'dismap_mod.R', 'file=..', 'ews.csv', 'geos_mod.R', 'git', 'hello_bundle.zip', 'hello_mod.R', 'house_inf.csv', 'iCloud云盘（归档）', 'import requests.py', 'jianqi.doc', 'jianqi.txt', 'jianqinb.tex', 'julia-1.7.2-linux-i686.tar.gz', 'julia-1.7.2-linux-x86_64.tar.gz', 'julia-1.7.2-mac64.tar.gz', 'lab3.Rmd', 'lab3.md', 'lab3_files', 'lab4.Rmd', 'lab40615.R', 'lab4_0615.Rmd', 'lat_mod.R', 'lianjia_ershou_futian_100.xlsx', 'lianjia_ershou_futian_3.csv', 'list1.txt', 'machine learning.ipynb', 'mydemo.Rmd', 'netural net training.ipynb', 'news.csv', 'opt', 'pfl-data.txt', 'pfl-init2.txt', 'pfl-model.txt', 'plot_grid.R', 'print("hhh").py', 'print("hhhelll").py', 'print("ssss").py', 'project', 'pythonProject', 'python_map_folium.ipynb', 'python学习笔记.ipynb', 'rent.csv', 'rstan.R', 's.yml', 'seaborn-data', 'sensors', 'sppa_mod.R', 'std_mod.R', 'stock_price.csv', 'test.pod', 'test.txt', 'tk.csv', 'tmdb250.ipynb', 'untitled1', 'vecstore.txt', 'vis_mod.R', 'wget-log', '学习.ipynb', '其他.pod', '随意.py', '导出5.csv', '数据集', '未命名-1.tex', '未命名.ipynb', '文件名.png', '工作簿1.csv', '装修项目.pod', '斐波那契.py', '聚类分析图.R', '相关性分析.doc', '劳动生产率.md', '社会网络分析.ipynb', '抓取新浪财经新闻.R']

数据类型

算法(algorithm)+数据结构(data structure)=程序(program)

对于字符串，可以使用很多有用的方法(method)，通过dir命令来查看相关的方法

dir(str)  #查看字符串对象的方法

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

help(str.split)

Help on method_descriptor:

split(self, /, sep=None, maxsplit=-1)
    Return a list of the words in the string, using sep as the delimiter string.
    
    sep
      The delimiter according which to split the string.
      None (the default value) means split according to any whitespace,
      and discard empty strings from the result.
    maxsplit
      Maximum number of splits to do.
      -1 (the default value) means no limit.

str_1 = "I love CUFE"
str_1.split(" ")  #.split() the '.' to pipe the parameter to split()
#return the list
sep_1=str_1.split(" ")
print(sep_1)

['I', 'love', 'CUFE']

print(sep_1[2])

CUFE

title

str_2= "hello world hello python"
str_2.title()

'Hello World Hello Python'

format()

这个通过字符串中的花括号和冒号，代替早期C语言中的格式化输出界定符号%，format()不限参数个数，显示位置可以不同于出现的顺序

"{1} {0}".format("hello","python")  #中间的空格也会被输出

'python hello'

list

相当于C中的数组，创建不复杂。

list1 = []

type(list1)

list

python列表中各个数据项的类型不需要都是相同的。

list1 = ['math',"中文",99,20]

list1

['math', '中文', 99, 20]

列表内部是可变的（mutable）

list1=[1,2,3,4,6,8,9,11]

list1[:4]

[1, 2, 3, 4]

list1[0:9:2]

[1, 3, 6, 9]

list1[::-1]

[11, 9, 8, 6, 4, 3, 2, 1]

添加列表元素

append()在列表末尾添加一个新元素。
insert()在列表的指定索引位置插入一个元素。
extend()把一个列表整体扩展添加到另外一个列表尾部

都是属于原地操作(in place)，被操作对象的列表的内存地址原封不动，不会因为函数的作用而发生改变

删除列表元素

pop()
remove()
clear()

全局内置函数

除了列表这个数据类型本身自带的函数，Python的内置函数也可以对列表进行操作。也就是说不是list独有的。

dir(list)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

fruits=['organge','apple','banana','pear']
sorted(fruits)

['apple', 'banana', 'organge', 'pear']

print(fruits)

['organge', 'apple', 'banana', 'pear']

用了sort内置函就会改变原有数据，直接在原有数据上操作。

fruits.sort()
print(fruits)

['apple', 'banana', 'organge', 'pear']

fruits.append("建祺")

max(fruits)

'建祺'

cmp(list1,list2)比较两个列表的元素
list(seq)将元组转化为列表
zip(list1,list2)将多个列表元素组合成一个个的元组

元组

不同于列表的标识，元组使用一对圆括号将元素囊括其中。

tup1 = ()
type(tup1)

tuple

tup1 = (100,)#只有一个元素在后面加一个逗号

高效的推导式子

又称为解析式，python中非常简洁地按照某种规则，以一个数据序列为基础，推导出另一个新的数据序列。

[生成表达式 for 变量 in 序列或迭代对象]

最外层的括号表示这个结果是一个列表。方括号内的描述的列表推导式相当于一个循环，只不过形式上更加简洁。

alist = [x**2 for x in range(4)]
print(alist)

[0, 1, 4, 9]

filter that dont meet the requirement

a_list = [1,'4','9','a',0,6,'hello']
squared_ints = [e**2 for e in a_list if type(e)==int]
print(squared_ints)

[1, 0, 36]

方法与函数

type(print)

builtin_function_or_method

type(max)

builtin_function_or_method

使用列表推导实现嵌套列表的平铺

将所有的列表中的列表展开：

vec = [[1,2,3],[3,4,5],[5,6,7]]
flat_vec = [num for elem in vec for num in elem]

print(flat_vec)

[1, 2, 3, 3, 4, 5, 5, 6, 7]

多条件组合构造特定列表

列表推导包含一对括号，在括号内有一个输出表达式，表达式之后跟着一条for语句，之后就能构造出多种的列表推导。

new_list = [(x,y) for x in [1,2,3] for y in [3,1,4] if x != y]
print(new_list)

[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

字典推导式

mcase ={'a':10,'b':30,'c':50}
kv_exchange ={v:k for k,v in mcase.items()}

print(kv_exchange)

{10: 'a', 30: 'b', 50: 'c'}

集合推导式

核心是一对花括号，字典内的元素需以“键/值对”形式出现